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Abstract 



In this article we present a review of the structure of the proton and the current status of 
our knowledge of the parton distribution functions (PDFs). The lepton-nucleon scattering 
experiments which provide the main constraints in PDF extractions are introduced and 
their measurements are discussed. Particular emphasis is given to the HERA data which 
cover a wide kinematic region. Hadron-hadron scattering measurements which provide 
supplementary information are also discussed. The methods used by various groups to 
extract the PDFs in QCD analyses of hard scattering data are presented and their results are 
compared. The use of existing measurements allows predictions for cross sections at the 
LHC to be made. A comparison of these predictions for selected processes is given. First 
measurements from the LHC experiments are compared to predictions and some initial 
studies of the impact of this new data on the PDFs are presented. 



Submitted to Reports on Progress in Physics. 



1 Introduction 



The birth of modern experimental particle physics in which particles were used to probe the 
structure of composite objects began with the famous alpha particle scattering experiment of 
Geiger and Marsden under the direction of Rutherford. In 1911 Rutherford published an analy- 
sis of the data providing evidence for atomic structure consisting of a massive positively charged 
nucleus surrounded by electrons [1]. Since then the use of particle probes to deduce structure 
has become standard, albeit at increasingly higher energy and intensity which brings its own 
technological and experimental challenges. 

Experiments of point-like electrons scattering off extended objects such as nuclei were expected 
to deviate from the predictions of Mott scattering - relativistic electron-electron Coulomb scat- 
tering [2]. This deviation, the nuclear form factor (expressed in terms of the 4-momentum trans- 
fer Q between initial and final state electrons), was shown to be related to the Fourier transform 
of the nuclear charge density [3,4]. In 1955 Hofstadter measured nuclear form factors with a 
100 — 500 MeV electron beam and obtained the charge density of the proton and other atomic 
nuclei [5-7]. The experiment was able to resolve the proton's charge radius to ~ 0.7 fm, at 
least an order of magnitude better than Rutherford's experiment. 

The idea that nucleons were composite particles was first proposed in 1964 by Zweig [8] and 
Gell-Mann [9]. Their quark model represented an underlying schema to classify the static 
properties of the known hadrons. However, this model had difficulties explaining why direct 
production of quarks did not occur. 

Detailed study of the structure of the proton advanced in 1967 when a 20 GeV linear electron 
accelerator commenced operation at the Stanford Linear Accelerator Centre (SLAC) with the 
aim of studying inelastic proton scattering, resonance production and the inelastic continuum in 
the region of 0.7 < Q 2 < 25 GeV 2 . This opened the field of deep inelastic scattering (DIS) in 
which the nucleon target was dissociated to large invariant mass states in the interaction. First 
observations of elastic scattering showed a rapid 1/Q 4 behaviour of the cross section [10] as 
expected from earlier low energy elastic form factor measurements at SLAC, Cornell, DESY 
and CEA [11-16]. This was found to be in stark contrast to the weaker Q 2 dependence of the 
inelastic cross section in the same energy range [17, 18]. 

For inelastic Coulomb scattering two form factors are required to describe the cross section, 
the so-called structure functions, which at fixed lepton beam energy can only depend on two 
kinematic quantities taken to be Q 2 and the electron energy loss in the nucleon rest frame, 
v [19]. The SLAC structure function measurements were found to exhibit scaling behaviour i.e. 
were independent of Q 2 . This behaviour had been predicted by Bjorken [20] in the same year. 

The new SLAC data prompted Feynman to develop the parton model of deep inelastic scatter- 
ing [21] in which the scaling behaviour is naturally explained as the point-like elastic scattering 
of free partons within the protons. Bjorken and Paschos then further developed the quark-parton 
model [22]. In the same year Callan and Gross showed that the behaviour of the longitudinally 
polarised part of the virtual photon scattering cross section required the constituents to be spin 
| fermions [23]. The association of these point-like constituents as the quarks of Gell-Mann 
and Zweig was gradually made and widely accepted by 1974. 
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The inability to observe free quarks (confinement) and the free quarks of the parton model 
was a contradiction that was solved through the idea of a scale dependant coupling which was 
large at low energy and feeble at high energies [24, 25]. This lead to the rapid development of 
quantum-chromodynamics (QCD) which was soon established as the correct theory of strong 
interactions. 

A wider programme of scattering experiments followed providing detailed insight into the struc- 
ture of the proton and electroweak (EW) interactions between the quark and lepton sectors of 
the Standard Model. First observations of weak neutral currents by the Gargamelle neutrino- 
nucleon experiment [26,27] were made in 1973, scaling violations of DIS cross sections were 
observed in 1974 [28], and the discovery of the gluon was made in 1979 by the TASSO e + e~ 
experiment [29] at DESY. 

In 1993-2007 HERA, the only ep collider, operated at DESY and opened up a wide kinematic 
region for precision measurements of proton structure and was capable of resolving structures 
to 10~ 3 fm. The precise HERA data together with fixed target DIS measurements and data from 
hadro-production experiments strongly constrain the proton's parton distributions. 

Experiments at CERN, DESY, SLAC and JLab (see for example [30-35]) have extensively 
studied polarised DIS to understand how the proton's spin arises from the orbital and intrinsic 
angular momenta of the constituent partons. In this article we omit any discussion of spin 
although a review of the field can be found for example in [36, 37]. 

The knowledge that we now have of QCD and proton structure is a vital tool in helping dis- 
entangle and interpret potential signals of new physics at the Large Hadron Collider (LHC) at 
CERN which has commenced operation colliding protons at a centre-of-mass energy y/s of up 
to 8 TeV, and is expected to reach the design value of about 14 TeV in the next few years. 

In this report we review the current status of our understanding of proton structure. In sec- 
tion 1.1 and the remaining sections of this chapter the formalism for deep inelastic scattering is 
given and the parton distribution functions (PDFs) are introduced. A more formal introduction 
can be found for example in [38]. The experimental constraints on the proton structure measure- 
ments are discussed in detail in chapter 2 including data from non-DIS experiments. Chapter 3 
provides an overview of the methods used to extract the PDFs from the various experimental 
measurements. Finally in chapter 4 the potential of the LHC in constraining our knowledge of 
proton structure is discussed. 

1.1 Kinematic Quantities in DIS 

In this section we outline the general formalism for unpolarised deeply inelastic lepton-nucleon 
scattering in perturbative QCD. Inclusive neutral current (NC) scattering of a charged lepton 
/ off a nucleon N proceeds via the reaction / ± A — > V^X and the exchange of virtual neutral 
electroweak vector bosons, 7 or Z°. Here X represents any final state. The purely weak charged 
current (CC) process is l^N — > vX and occurs via exchange of a virtual W ± boson. 

The measured cross sections are usually expressed in terms of three variables x, y and Q 2 
defined as 



Q 2 



p_q 
p ■ k 



(i) 



X = 



y = 



2p ■ q 



2 



where k and k' are the momenta of the initial and scattered lepton, p is the nucleon momentum, 
and q the momentum of the exchanged boson (see Fig. 1). The first of these, also known 
as XBjorken, is the fraction of the target nucleon's momentum taken by the parton in the infinite 
momentum frame where the partons have zero transverse momenta. The inelasticity is measured 
by the quantity y and is the fractional energy loss of the lepton in the rest frame of the target 
nucleon. It also quantifies the lepton-parton scattering angle 9* measured with respect to the 
lepton direction in the centre-of-mass frame since y = |(1 — cos 9*). The 4-momentum transfer 
squared from the lepton Q 2 , quantifies the virtuality of the exchange boson. The three quantities 
are related to the lepton-nucleon centre-of-mass energy, y/s, via Q 2 = sxy which holds in the 
massless approximation since s = (k + p) 2 . Thus for fixed centre-of-mass energy the cross 
sections are dependant on only two quantities. Modern experiments typically publish measured 
cross sections differentially in two variables, usually x and Q 2 or x and y. 

i{k) i(k') 




N(p) N(p) 



Figure 1 : Left: Schematic diagram of NC DIS of a charged lepton I with incoming and outgoing 
momenta k and k' interacting with nucleon N with momentum p via the exchange of a virtual 
Z/Y boson (q) between the lepton and a parton carrying fractional momentum xp (see text). 
Right: Schematic of CC neutrino induced DIS via the exchange of a virtual W ± boson. 

Two other related kinematic quantities are also sometimes used and are given here for com- 
pleteness. They are 

W 2 = {q + p) 2 = Q 2 — + m 2 N v = V -^ . (2) 

x rriN 

W 2 is the invariant mass squared of the final hadronic system, and v is the lepton energy loss in 
the rest frame of the nucleon, with the nucleon mass. 



1.2 The Quark Parton Model 

The Quark Parton Model (QPM) [2 1 ,22] describes nucleons as consisting of massless point-like 
spin \ quarks which are free within the nucleon. The gluon is completely neglected, but despite 
this failing it is nevertheless a useful conceptual model with which to illustrate a discussion 
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of proton structure. Nucleon and quark masses are also neglected, an approximation that is 
valid provided the momentum scale of the scattering process Q is large enough. The parton 
distribution functions fi(x) in the QPM are number densities of parton flavour i with fraction 
x of the parent nucleon's energy and longitudinal momentum. Often the momentum weighted 
distributions xfi(x) are used. In standard notation the anti-quark PDFs are denoted xf^x), 
and PDFs for each quark flavour are written as u, d, s, c, b for the up, down, strange, charm and 
bottom respectively. The PDFs obey counting sum rules which for the proton are written as 



1 r l 



[u{x) - u{x)} dx = 2 / [d(x) - d(x)] dx = 1 (3) 

Jo 

and / [q(x) — q(x)] dx = for q = s, c, b (4) 

Jo 

and require the valence structure of the proton to correspond to uud. The valence distributions 
are defined as u v = u — u and d v = d — d. The constraint of momentum conservation in the 
QPM is written as 

/ Y^x [q.i{x) + q~i{x)] dx = 1 (5) 
Jo l 

where the sum runs over all active parton flavours nj. 

Deeply inelastic lepton-nucleon scattering cross sections are calculated from incoherent sums 
of elastic lepton-parton processes. More generally, the hadronic interaction cross section for a 
process A + B X can be written as 

va,b^x = ^ / / ff~(xi) ■ ff(x 2 ) ■ d-ij^x dxxdx 2 + [x x O x 2 ] (6) 

hj 

where Gij^x 1S the partonic cross section for interactions of two partons with flavour i and 
j. The fact that the PDFs in Eq. 6 are universal is known as the factorisation property: PDFs 
extracted from an analysis of e.g. inclusive DIS measurements can be used to calculate the 
cross sections of other processes in lepton-hadron or hadron-hadron interactions. A proof of the 
factorisation theorem in perturbative QCD can be found in [39]. 

The QPM represents the lowest order approximation of QCD and as such does not take into 
account gluon contributions to the scattering process. This implies a scale (Q) invariance to 
all QPM predictions known as Bjorken scaling - the quarks are free within the nucleon and 
thus do not exchange momenta. Early DIS measurements [17, 18] demonstrated approximate 
scaling behaviour for x ~ 0.3 indicating the scattering of point-like constituents of the proton, 
but subsequent measurements extended over a wider x range showed that scaling behaviour is 
violated [ L8]. This is interpreted as being due to gluon radiation suppressing high x partons and 
creating a larger density of low x partons with increasing Q 2 . Thus in the absence of scaling, 
the PDFs fi(x) become Q 2 dependent, fi(x, Q 2 ). 

It is usual to consider the neutron PDFs (/") as being related to the proton PDFs (Jf) by invok- 
ing strong isospin symmetry, i.e. that u p = d n , d p = u n , u p = d n , dP = u n and that the PDFs 
of other flavours remain the same for neutron and proton. This is used in order to exploit the 
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structure function measurements performed in DIS off a deuterium target. For the remainder of 
this article PDFs always refer to proton PDFs. 

The QPM provides a good qualitative description of scattering data. It was convincingly tested 
in the prediction of the pp — > +X process [40]. Sum rules similar to those in Eqs. 3 and 5 
such as the Gross-Llewellyn-Smith (GLS) [4 L], Adler [42], Bjorken [43] and Gottfried [44] sum 
rules were also well predicted by the QPM (for details see [45,46]). 



1.3 DIS Formalism 

The scattering of a virtual boson off a nucleon can be written in terms of a leptonic and a 
hadronic tensor with appropriate couplings of the exchanged boson to the lepton and the par- 
tons (see for example [46]). The hadronic tensor is not calculable from first principles and is 
expressed in terms of three general structure functions which for NC processes are F 2 , xF 3 
and Fl. The total virtual boson absorption cross section is related to the F 2 and xF 3 parts in 
which both the longitudinal and the transverse polarisation states of the virtual boson contribute, 
whereas only the longitudinally polarised piece contributes to F L . These NC structure functions 
can be further decomposed into pieces relating to pure photon exchange, pure Z° exchange and 
an interference piece. 

At the Born level the NC cross section for the process 1 e ± p — > e ± X is given by 



dV 



± 

NC 



dx dQ< 



~x~Q A 



Y + F 2 T Y.xF 3 - y 2 F L 



(7) 



where a = a(Q 2 = 0) is the fine structure constant. The helicity dependencies of the elec- 
troweak interactions are contained in Y± = 1 ± (1 — y) 2 . 

The generalised proton structure functions [47], F Lt2 ^, may be written as linear combinations of 
the hadronic structure functions Fl <2 , F^ 3 , and F[ 2 3 containing information on QCD parton 
dynamics as well as the EW couplings of the quarks to the neutral vector bosons. The function 
F 2 is associated to pure photon exchange terms, Fl z 2Z correspond to photon-Z° interference 
and F[ 2 3 correspond to the pure Z Q exchange terms. Neglecting F L , the linear combinations 
for arbitrary longitudinal lepton polarised e ± p scattering are given by 



Ft = F 2- {v e ± Pa e )K 



Q 2 + M 2 2 



[v 2 + a 2 e ±2Pv e a e )K 2 



xFt = -K ± Pv e )K — Q \, 2 xFf + (2a e v e ± P[v 2 + a 2 e ])K 2 



J Q 2 + M 2 



Q~ 



Q 2 + M 2 



Q 1 



Q 2 + M 2 



z J 



F z 



xFi 



(8) 
(9) 



where P is the degree of lepton polarisation, a e and v e are the usual leptonic electroweak axial 



and vector couplings to the Z ' and k is defined by k~ 
the masses of the electroweak bosons. 



M 2 



i 2 w 



Mi 



M w and M z being 



'For DIS of a charged lepton, the example of an electron or positron beam is taken here. 
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The structure function F L (proportional to the longitudinally polarised virtual photon scatter- 
ing cross section, a L ) may be decomposed in a manner similar to F 2 (proportional to the sum 
of longitudinally and transversely polarised virtual photon scattering cross sections at + a T ). 
Its contribution is significant only at high y (see Eq. 7). The ratio R defined as R(x, Q 2 ) = 
<Jl(x,Q 2 )/o't(x,Q 2 ) = F~l/(F<2 — Fl) is often used instead of Ft to describe the scattering 
cross section. The QPM predicts that longitudinally polarised virtual photon scattering is for- 
bidden due to helicity conservation considerations, i.e. F L = for spin half partons. The fact 
that F L is non-zero is a consequence of the existence of gluons and QCD as the underlying 
theory. 

Over most of the experimentally accessible kinematic domain the dominant contribution to the 
NC cross section comes from the electromagnetic structure function F 2 . Only at large values 
of Q 2 do the contributions from Z° boson exchange become important. For longitudinally 
unpolarised lepton beams F 2 is the same for e~ and for e + scattering, while the xF 3 contribution 
changes sign as can be seen in Eq. 7. 

In the QPM the structure functions F 2 , F 2 Z and F 2 Z are related to the sum of the quark and 
anti-quark densities 

[F 2 , Ff, F z \ = x 2e ^ v l + <^ + ?> (10) 

i 

and the structure functions xF% z and xF 3 z to their difference which determines the valence 
quark distributions q v 

[xF^' z , xF 3 z ) = 2x ^2[e g a q , v q a q ){q - q} = 2x ^[e ? a 9 , v q a q ]q v . (11) 
q q 

Here e q is the charge of quark q in units of the positron charge and v q and a q are the vector and 
axial- vector weak coupling constants of the quarks to the Z°. 

For CC interactions the Born cross section may be expressed as 

2 

(Y + W 2 (lp) =f Y„xW 3 (lp) - y 2 W L (lp)) (12) 

where crj c (cr^) denotes the cross section for e + p or up (e~p or up) interactions. In the case 
of (anti-)neutrino interactions, 1 ± P = 2 in Eq.12. The weak coupling is expressed here as the 
Fermi constant Gf- The CC structure functions W 2 , W 3 and Wt are defined in a similar manner 
to the NC structure functions [48]. In the QPM (where Wt = 0) they may be interpreted as 
sums and differences of quark and anti-quark densities and are given by 

W 2 (e + p) = W 2 {yp) = x(U + D) , xW 3 (e + p) = xW 3 (up) = x{D - U) , (13) 
W 2 (e~p) = W 2 {yp) = x(U + D) , xW 3 (e~p) = xW 3 (up) = x(U — D) (14) 

where U represents the sum of up-type, and D the sum of down-type quark densities, 

U = u + c 

U = u + c 

D = d+s+b 

D = d + s + b. (15) 



dx dQ 2 



;i±p) 



Airx 



Mir 



Ml 
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Neutrino DIS experiments generally use a heavy target to compensate for the low interaction 
rates. For an isoscalar target, i.e. with the same number of protons and neutrons, the structure 
functions become, assuming q = q for q = s, c, b : 



W 2 (vN) = W 2 {vN) 
W 3 (uN) 
W 3 (iyN) 



x(U + D- 
x(u v + d v 
Qlj(^Uju I d^j 



U + D) 

- 2s + 2b 

- 2s - 2b 



2c) 
2c) 



(16) 
(17) 
(18) 



such that the difference between uN and uN cross sections is directly sensitive to the total 
valence distribution. 

Some analyses of DIS cross sections present the data in terms of "reduced cross sections" where 
kinematic pre-factors are factored out to ease visualisation. Typically they are defined as: 



a NC (x,Q ) = — 



1 Q 4 x d 2 a NC 



Y + 2vra 2 dxdQ 2 



&cc(x,Q 2 



2ttx 



M 2 W 



M 2 W 



d 2 a C c 
dxdQ 2 



(19) 



1.4 QCD and the Parton Distribution Functions 

The QPM is based on an apparent contradiction that DIS scattering cross sections may be de- 
termined from free quarks which are bound within the nucleon. Despite this the QPM was very 
successful at being able to take PDFs from one scattering process and predicting cross sections 
for other scattering experiments; it nevertheless has difficulties. The first of these is the failure 
of the model to accurately describe violations of scaling and scale dependence of DIS cross sec- 
tions. The fact that partons are strongly bound into colourless states is an experimental fact, but 
why they behave as free particles when probed at high momenta is not explained. The QPM is 
also unable to account for the full momentum of the proton via measurements of the momentum 
sum rule of Eq. 5 indicating the existence of a new partonic constituent which does not couple 
to electroweak probes, the gluon (g), which modifies the momentum sum rule as below: 

i 

[qi(x] + q~i(x)] + xg(x) dx = 1 (20) 

i 

It is only by including the effects of the gluon and gluon radiation in hard scattering processes 
that an accurate description of experimental data can be given. These developments led to the 
formulation of quantum-chromodynamics. 

Any theory of QCD must be able to accommodate the twin concepts of asymptotic freedom and 
confinement. The former applies at large scales (Q) where experiments are able to resolve the 
partonic content of hadrons which are quasi-free in the high energy (short time-scale) limit, and 
is a unique feature of non-abelian theories. The latter explains the strong binding of partons into 
colourless observable hadrons at low energy scales (or equivalently, over long time-scales). In 
1973 Gross, Wilczek and Politzer [24,25] showed that perturbative non-abelian field theories 
could give rise to asymptotically free behaviour of quarks and scaling violations. By intro- 
ducing a scale dependent coupling strength a s (Q) = g 2 /An (where g is the QCD gauge field 
coupling and depends on Q), confinement and asymptotic freedom can be accommodated in a 
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single theory. Perturbative QCD (pQCD) is restricted to the region where the a s coupling is 
small enough to allow cross sections to be calculated as a rapidly convergent power series in 
a s . Furthermore any realistic theory of QCD must be renormalisable in order to avoid divergent 
integrals arising from infinite momenta circulating in higher order loop diagrams. The pro- 
cedure chosen for removing these ultraviolet divergences fixes a renormalisation scheme and 
introduces an arbitrary renormalisation scale, fj, R . A convenient and widely used scheme is the 
modified minimal subtraction (MS) scheme [ ]. When the perturbative expansion is summed 
to all orders the scale dependence of observables on fiR vanishes, as expressed by the renormal- 
isation group equation. For truncated summations the scale dependence may be absorbed into 
the coupling i.e. a s — > a s (fiR). 

In addition to the problems of ultraviolet divergent integrals, infrared singularities also appear 
in QCD calculations from soft collinear gluon radiation as the gluon transverse momentum 
k F — > 0. These singularities are removed by absorbing the divergences into redefined PDFs 
within a given choice of scheme (the factorisation scheme) and choice of a second arbitrary 
momentum scale fi F (the factorisation scale). By separating out the short distance and long 
distance physics at the factorisation scale ji F the hadronic cross sections may be separated into 
perturbative and non-perturbative pieces. The non-perturbative piece is not a priori calcula- 
ble, however it may be parametrised at a given scale from experimental data. This procedure 
introduces a scale dependence to the PDFs. By requiring that the fi F scale dependence of 
F 2 vanishes in a calculation summing over all orders in the perturbative expansion a series of 
integro-differential equations may be derived that relates the PDFs at one scale to the PDFs 
at another given scale. These evolution equations obtained by Dokshitzer, Gribov, Lipatov, 
Altarelli and Parisi (DGLAP) [50-53] are given in terms of a perturbative expansion of splitting 
functions (Pba) which describe the probability of a parent parton a producing a daughter parton 
b with momentum fraction z by the emission of a parton with momentum fraction 1 — z. Three 
leading order equations are derived for the non-singlet (q^ s = qi — q~j), singlet (qf = qi + qi) , 
and gluon distributions: 

* x dy 



dq NS (x,fi 2 F ) 


a s (/4) 


dlogn 2 F 


2tt 


dq s (x,fi 2 F ) 




d log n\ 


2tt 


dg(x,n%) 


a s (/4) 


<91og/4 


2tt 



y 

Uy 



[q NS {y^V)P m {x/y)] 



V 



(21) 



(22) 



[q s (y,^ F ) p qq( x /y) + 9{y,^F) p q a ( x /y)] 

I ^-[q s (y,fi F )P gq (x/y)+g(y,ti)P gg (x/y)] . (23) 

j x y 



The corresponding splitting functions are given, at leading order (LO), by 



P, 



<1<1 



P 



qy 



P 



m 



P 



99 



1 r 2 

- \X 

2 L 



1 + x 2 



X) 



3 -S(l 
2 v 



i + U 



X 



X 



X 



X 



+ x(l — x) + 



X 




"11 


n i 


(l-z)+_ 


+ 


_y 


3 



6(1 -x) 



(24) 
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where [f(x)] + = f(x) - 5(1 - x) f(y)dy. The DGLAP evolved PDFs then describe the 
PDFs integrated over transverse momentum h? up to the scale /ip. The splitting functions have 
also been calculated at next-to-leading order (NLO) [54] and more recently in next-to-next-to- 
leading order (NNLO) [55]. 




Figure 2: Example PDFs at NLO at Q 2 = 4 GeV 2 and Q 2 = 10 4 GeV 2 : the gluon density, the 
sea density ^ <?> an d the valence densities u v = u — u and d v = d — d. 

The DGLAP equations allow the PDFs to be calculated perturbatively at any scale, once they 
have been measured at a given scale. Figure 2 shows example NLO PDFs for the gluon, the va- 
lence quarks and the sea quarks, for two values of Q 2 . While the gluon and the sea distributions 
increase very quickly with Q 2 , the non-singlet valence distributions are much less affected by 
the evolution. 

Higher orders should also be accounted for in the partonic matrix element a of Eq. 6, such that 
(t — y 6"o -\- ai(fiF, a s (fiRj) + .... At LO in a s the [if dependence can be absorbed into the PDFs 
but beyond LO this cannot be done in a process independent way, and hence a depends on 
[ip and on the factorisation scheme. An all orders calculation then cancels the (if dependence 
of the PDFs with the ji F dependence of a. The factorisation scheme most commonly used 
is also the MS scheme 2 , and a common choice of factorisation and renormalisation scales is 
Hf = I^r = Q- It is conventional to estimate the influence of uncalculated higher order terms 
by varying both scales by factors of two. 

At NLO the relationships between the structure functions (or any other scattering cross section) 
and the PDFs are modified in a factorisation scheme dependent way. The modifications are 
characterised by Wilson coefficient functions C for the hard scattering process expressed as a 
perturbative series. 

2 The DIS scheme [56] is also useful since it is defined such that the coefficients of higher order terms for the 
structure function F2 are zero and so Fi retains its QPM definition in this scheme. 
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In the MS scheme Eqs. 10 and 1 1 take additional corrections to the parton terms such that for 
F 2 the relation becomes, with the factorisation and renormalisation scales both set to Q 2 : 

F 2 (x,Q 2 ) = xJ2eUf&Q 2 ) + 

i 

f* [E (j)<i?(v,Q 2 ) + 4c 2 , g (£) 5 (,,Q 2 ) 



(25) 



The C 2 , q and C 2 , g terms are the coefficient functions for q induced and g induced scattering 
contributing to F 2 . Here the scaling violations are seen explicitly in the additional term propor- 
tional to a s where the integrand is sensitive to partonic momentum fractions y > x. For low 
and medium x, the integral is dominated by the second term and y ~ x. Hence, from the scale 
dependence of a s , the derivative dF 2 /d In Q 2 , to the first order in In Q 2 , is driven by the product 
of a s and the gluon density 3 . By replacing the e 2 couplings with the corresponding ones for 
Z/Y interference and pure Z exchange as in Eq. 10 the QCD corrected formulae for F^ Z and 
F2 are obtained. 

QCD corrections to Fl and xF 3 must also be taken into account. For xF 3 the LO QCD corrected 
formula is 



xF 3 (x,Q 2 



■E^ s m 2 



a s (Q 2 
2tt 



1 dy 



3,g 



,JVS 



(y,Q 2 ) 



(26) 



and is independent of the g density resulting in much weaker scaling violations than for F 2 . 
Finally for Fl 



F L (x,Q 2 



a s (Q 2 ) f 1 dy 



2tt 



I 7 E e ^(^)^Q 2 ) + ^, s Q 5 (,,Q 



(27) 



The coefficient functions at LO in the MS scheme are given below for completeness 



C 3t8 (x) 
Cl, s {x) 



r 



in 



l-x 



(.T 2 + (l-x) 2 )ln- 



-(9 + 5x) 



- l + 8x(l-x) 



Anjx{l — x) 
C 2 Ax)-\{l + x) 



(28) 



In the DIS scheme, C 2 „(x) and C 2 Jx) are zero. 



To date an enormous variety of processes have been measured at colliders and confronted with 
the predictions of pQCD. These include not only inclusive measurements of DIS cross sections, 
but also semi-inclusive measurements of jet production rates, angular distributions, and mul- 
tiplicities in DIS and in hadronic colliders as well as in e + e~ collisions. In all cases pQCD 
provides a good description of the measurements. 



3 At LO, dF 2 / d\nQ 2 {x,Q 2 ) at low x is proportional to xg(x,Q 2 ) to a good approximation. At NLO and 
beyond, a larger range of values contributes to the integral, and dF 2 /dlnQ 2 (x, Q 2 ) is sensitive to g(y, Q 2 ) for 
y>x. 
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2 Experimental Constraints 



In the following sections we describe the main features and results of experiments that put 
constraints on unpolarised proton PDFs. For reasons of space we do not provide a complete 
description of all experimental data, but rather focus on those experiments whose data is used in 
global approaches to extract the PDFs. The chapter naturally divides into common measurement 
techniques in DIS, measurements from fixed target DIS experiments, results from the HERA 
ep collider which dominates the bulk of precision proton structure data, and results from hadro- 
production experiments. The LHC experiments will be described in chapter 4. A convenient 
online repository of experimental scattering data for PDF determinations can be found at [57]. 
A summary of the experimental constraints described here will be given in section 2.5 that 
concludes this chapter. 

2.1 Measurement Techniques in DIS 

The following sections outline some of the general issues faced by experimentalists in perform- 
ing their measurements. These include corrections to the measured event rates to account for 
detector losses due to inefficiency and resolution effects, as well as theory-based corrections 
to extract structure functions and to take into account the sometimes large effects of QED ra- 
diation. Finally fixed target measurements are discussed which often require corrections for 
nuclear targets and kinematic effects that arise at low Q 2 . 

2.1.1 Detector efficiency and resolution corrections 

Experimental corrections to account for the limited and imperfect detector acceptance are per- 
formed using Monte Carlo simulations but require an input PDF to be used. Thus the acceptance 
corrections are weakly dependent on these input PDFs. Experimenters circumvent this problem 
using an iterative approach whereby the measured structure functions are then further used to 
tune the MC input which leads to a modified measurement. The procedure is stopped when 
the iterations converge, i.e. the difference in the measurements changes by a small amount. 
Typically this occurs after one or two iterations [58]. 

2.1.2 Extraction of structure functions 

The measured structure functions and differential cross sections are quoted at a point in x and Q 2 
and are derived from bin integrated values. A correction is needed to convert the measurement 
to a differential one. This is usually performed with a parameterisation of the cross section 
derivatives across the bin volume. This can be done by weighting each event by the ratio of 
structure functions at the bin centre and the x, Q 2 of that event [59]. Alternatively the bin 
integrated measurement can be corrected by a single factor which is the ratio of the structure 
function at the bin centre to the bin integrated value derived from an analytical calculation [60]. 
The dependence of the correction on the input parameterisation is usually small. 
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Early experiments often presented results as final corrected values of the electromagnetic struc- 
ture functions F 2 and F L (or R). In order to make this decomposition of the cross section exper- 
imentalists restricted themselves to the phase space region of low y where the contribution of 
Fl is strongly suppressed in order to extract F 2 . Alternatively a value of Fl may be assumed in 
order to extract F 2 . Both of these approaches have been used, but recent publications focus on 
the measurement of the differential cross section as the primary measurement which necessarily 
has fewer assumptions. Extractions of the individual structure functions are also provided for 
convenience. The HI and ZEUS experiments recommend the use of differential cross sections 
only as input to further QCD analyses of the data [61]. 

By utilising different beam energies, scattering cross sections can be measured at fixed points 
in x and Q 2 but different y thus allowing direct measurements of the structure function F L to be 
made 4 . Since the technique relies on the measurement of the difference between cross section 
measurements for two or more values of \J~s the experimental uncertainties on F L are sensitive 
to systematic uncertainties in the relative normalisation of the data sets, and are often highly 
correlated point-to-point. 

2.1.3 Reconstruction Methods 

When the centre-of-mass energy of the interaction is known, the DIS cross section depends on 
two variables only, and the kinematic variables x, y and Q 2 can be fully reconstructed from two 
independent measurements. Fixed target experiments of charged lepton DIS generally used the 
measurement of the energy and angle of the scattered lepton to reconstruct the kinematics - the 
lepton method. 

The use of colliding beams to measure DIS cross sections allowed new detector designs to be 
employed whereby the HERA experiments could fully reconstruct the hadronic final state in 
most of the accessible kinematic domain. Hence, x, y and Q 2 in NC interactions may be de- 
termined using energy and angular measurements of the scattered lepton alone, measurements 
of the inclusive hadronic final state (HFS), or some combination of these. This redundancy 
allows very good control of the measurements and of their systematic uncertainties. In contrast 
charged lepton CC interactions may only be reconstructed using measurements of the hadronic 
final state since the final state neutrino is unobserved. Each method has different experimen- 
tal resolution and precision as well as different influence from QED radiative corrections (see 
below). A convenient summary of the different methods is given in [61] and are compared 
in [62]. 

At HERA the main NC reconstruction methods used are the double-angle method [63] (using 
the polar angles of the lepton and HFS), and the eS method [64] which combines the scattered 
lepton energy and angle with the total energy and longitudinal momentum difference, E — Pz, 
of the HFS. 

DIS experiments using a (wide band) muon neutrino beam face an additional problem in deter- 
mining the incident neutrino energy. It is usually reconstructed as the sum of the momentum of 
the scattered muon and of the energy of the HFS measured in a calorimeter. A third indepen- 
dent measurement, usually taken to be the angle of the scattered muon, is then needed to fully 
reconstruct the kinematics. 

4 Additional techniques at fixed ^/s have also been employed to determine indirectly, primarily as consis- 
tency checks of QCD. 
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2.1.4 QED radiative corrections 



The treatment of radiative corrections is an important aspect of DIS scattering cross sections 
measurements and was first discussed in [65]. The corrections allow measured data to be cor- 
rected back to the Born cross section in which the influence of real photon emission and virtual 
QED loops are removed. It is the Born cross sections that are then used in QCD analyses of 
DIS data to extract the proton PDFs (see section 3). This topic has been extensively discussed 
for HERA data in [66] and the references therein. 

Corrections applied to the measured data are usually expressed as the ratio of the Born cross 
section to the radiative cross section and can have a strong kinematic dependence since for 
example, the emission of a hard real photon can significantly skew the observed lepton momen- 
tum. Thus the corrections also depend on the detailed experimental treatment and the choice 
of reconstruction method used to measure the kinematic quantities. In ep scattering, hard final 
state QED radiation from the scattered electron is experimentally observable only at emission 
angles which are of the size of the detector spatial resolution. 

Complete QED calculations at fixed order in a are involved and often approximations are used, 
particularly for soft collinear photon emission. These approximations are readily implemented 
into Monte Carlo simulations allowing experimentalists to account for radiative effects easily. 
For the HERA measurements [61] 0(a) diagrams are corrected for with the exception of real 
photon radiation off the quark lines. This is achieved using Monte Carlo implementations [67, 
68] checked against analytical calculations [48, 69] which agree to within 0.3 — 1% in the NC 
case (2% for x > 0.3) and to within 2% for the CC case. The quarkonic radiation piece is 
known to be small and is accounted for in the uncertainty given above. 

The real corrections are dominated by emission from the lepton lines and are sizable at high 
and low y [66]. For example, at ^fs = 301 GeV and Q = 22 GeV the leptonic ep corrections 
are estimated to be +40% at y = 0.75 when using the lepton reconstruction method. This is 
dramatically reduced to +15% if the e£ reconstruction method is used, and if an analysis cut of 
E — Pz > 35 GeV is employed the correction is further reduced to +8% [70]. 

The vacuum polarisation effects are also corrected for such that published cross sections corre- 
spond to a(Q = 0) = 1/137.04. These photon self energy contributions depend only on Q and 
amount to a correction of —6% for Q = M z and —4% for Q = 12 GeV. 

2.1.5 Higher order weak corrections 

The weak corrections are formally part of the complete set of 0(a) radiative corrections to DIS 
processes but are often experimentally treated separately to the QED radiative corrections dis- 
cussed above. The weak parts include the self energy corrections, weak vertex corrections and 
so-called box diagrams in which two heavy gauge bosons are exchanged [6( ]. The self energy 
corrections depend on internal loops including all particles coupling to the gauge bosons e.g. 
the Higgs boson, the top quark and even new particle species. For this reason experimentalists 
sometimes publish measurements in which no corrections for higher order weak corrections 
are accounted for. Rather, comparisons to theoretical predictions are made in which the weak 
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corrections are included in the calculations. Care must be taken to define the scheme (i.e. the 
set of input electroweak parameters used) within which the corrections are defined. 

Two often used schemes are the on-mass-shell scheme [ ], and the scheme [72]. In the 
former the EW parameters are all defined in terms of the on shell masses of the EW bosons. 
The weak mixing angle 9 W is then related to the weak boson masses by the relation sin 2 W = 
1 — M^/M§ to all perturbative orders. In the G M scheme, the Fermi constant which is very 
precisely known through the measurements of the muon lifetime [73], is used instead of Mw- 

The scheme dependence is of particular importance in the CC scattering case where the in- 
fluence of box diagrams is relatively small and the corrections are dominated by the self en- 
ergy terms of the W propagator affecting the normalisation of the cross section. In the 
scheme these leading contributions to the weak corrections are already absorbed in the mea- 
sured value of G M and the remaining corrections are estimated to be at the level of 0.5% at 
Q 2 = 10 000 GeV 2 [74] where experimental uncertainties are an order of magnitude larger. For 
the HERA NC structure function measurements the EW corrections are estimated to reach the 
level of ~ 3% at the highest Q 2 [75] and should be properly accounted for in fits to the data. 

2.1.6 Target mass corrections and higher twist corrections 

For scattering processes at low scales approaching soft hadronic scales such as the target nu- 
cleon mass, additional hadronic effects lead to kinematic and dynamic 1/Q 2 power corrections 
to the factorisation ansatz Eq. 6. Both of these corrections are important for DIS at low to 
moderate Q 2 , in particular in the kinematic domain covered by fixed target DIS experiments. 

In DIS, power corrections of kinematic origin, the target mass corrections (TMC), arise from 
the finite nucleon mass. For a mass m N of the target nucleon, the Bjorken x variable is no 
longer equivalent to the fraction of the nucleon's momentum carried by the interacting parton 
in the infinite momentum frame. This momentum fraction is instead given by the so-called 
Nachtmann variable £ : 

2x I 

£ = with 7 = \ / 1 + Ax 2 m 2 N IQ 2 

1 + 7 v 

which differs from x at large x (above ~ 0.5) and low to moderate Q 2 . Approximate formulae 
which relate the structure functions on a massive nucleon F^ MC (x, Q 2 ) to the massless limit 
structure functions Fj can be found in [76,77], for example: 

F™ c (x, Q 2 ) = -^F 2 (£, Q 2 ) + ^ 6 4- f dZ'/Z' 2 m', Q 2 ) ■ 
<T7 Q 7 J£_ 

The ratios F^ MC / F { rise above unity at large x, with the rise beginning at larger values of x as 
Q 2 increases. The target mass correction can be quite large: for x = 0.8 it reaches ~ 30% at 
Q 2 = 5 GeV 2 . 

In addition, power corrections of dynamic origin, arising from correlations of the partons within 
the nucleon, can also be important at low Q 2 . The contribution of these higher twist terms [78] 
to the experimentally measured structure functions F* xp can be written as 

F t exp (x, Q 2 ) = if A/c (x, Q 2 ) + H^/Q 2 + ... (29) 
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These terms have been studied in [ ] and more recently in [80, 81] and found to be sizable at 
large x, however they remain poorly known. QCD analyses of proton structure that make use 
of low Q 2 data may impose kinematic cuts to exclude the measurements made at low Q 2 and 
high x (i.e. at low W 2 ) that may be affected by these higher twist corrections. Alternatively, a 
model can be used for the Hi(x) terms, whose parameters can be adjusted to the data. 

2.1.7 Treatment of data taken with a nuclear target 

Neutrino DIS experiments have used high Z, A targets such as iron or lead 5 , which provide rea- 
sonable event rates despite the low v interaction cross section. The expression of the measured 
quantities, F A , in terms of the proton parton densities, has to account for the facts that: 

• the target is not perfectly isoscalar; e.g. in iron, there is a 6.8% excess of neutrons over 
protons; 

• nuclear matter modifies the parton distribution functions; i.e. the parton distributions in 
a proton bound within a nucleus of mass number A, f A (x, Q 2 ), differ from the proton 
PDF f(x, Q 2 ). Physical mechanisms of these nuclear modifications (shadowing effect at 
low x, the nucleon's Fermi motion at high x, nuclear binding effects at medium x) are 
summarised e.g. in [83,84]. The ratios R A = f A (x, Q 2 )/f(x, Q 2 ) are called nuclear 
corrections, and can differ from unity by as much as 10 — 20% in medium-size nuclei. 

Nuclear corrections are obtained by dedicated groups, from fits to data of experiments that used 
nuclear (A) and deuterium (d) targets: the structure function ratios F A jF^ measured in DIS 
(at SLAC by El 39, and at CERN by the EMC and NMC experiments), and the ratios of Drell- 
Yan qq annihilation cross sections o-^, Y / a DY measured by the E772, E866 experiments. Recent 
analyses also include the measurements of inclusive pion production obtained by the RHIC 
experiments in deuterium-gold collisions at the Brookhaven National Laboratory (BNL). The 
measurements of charged current DIS structure functions at experiments using a neutrino beam 
(see sections 2.2.4 and 2.2.6) can also be included, as done in [85] for example. 

Figure 3 shows an example of nuclear corrections for the u v , u, s and g densities at a scale 
of Q 2 = 10 GeV 2 , as obtained from the recent analysis described in [85]. They are shown 
for beryllium (A = 9), iron (A = 56), gold (A = 197) and for lead (A = 208). The size 
of the nuclear corrections is larger for heavier nuclei. Nuclear effects in deuterium are usually 
neglected, although some QCD analyses [8 1 ] account for them explicitly. They were studied 
in [86] by analysing data on F^/F^ and were found to be small, 0{\ — 2)%. In [8 1 ] nuclear 
corrections on F$ were found to be below 2% for x < 0.7, rising above 10% for x > 0.8. 

Nuclear corrections derived from other analyses are also overlaid in Fig. 3. While the correction 
factors obtained by these analyses are in reasonable agreement for the up and down quarks, 
sizable differences are observed for other flavours and for the gluon 6 . 

5 An exception being the WA25 experiment [82], which measured vd and vd cross sections using a bubble 
chamber exposed to the CERN SPS wide-band neutrino and anti-neutrino beams, in the mid-eighties. 

6 In the case of the strange density, the differences seen in Fig. 3 are largely due to the fact that these analyses 
used different proton PDFs, for which the strange density differ significantly. 
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Figure 3: Example of nuclear corrections obtained from the analysis of [85], atQ 2 = 10 GeV 2 , 
for four different nuclei. From [85]. 

Concerns have been raised in [87-89], regarding the possibility that nuclear corrections may be 
different for NC and CC DIS. Such a breaking of factorisation, if true, would cast serious doubts 
on the constraints on proton PDFs derived from neutrino DIS experiments 7 . However, this was 
not confirmed by the recent analysis described in [85]. The analysis of [90] also reported no 
tension between the NC and CC DIS data off heavy nuclei. 



2.2 Measurements from fixed target DIS experiments 

A brief description of the main fixed target DIS experiments is given in this section. Further 
details can also be found in [91]. The x, Q 2 ranges and beam energies of the measurements are 
summarised in Tab. 1 in section 2.5. This latter section also contains a representative compila- 
tion of the measurements described here (see Fig. 15 and Fig. 16). 

7 As will be seen later, these experiments set important constraints on the separation between valence and sea 
densities, and on the strange PDF that, otherwise, is largely unconstrained. 
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2.2.1 SLAC 



The first experiments to probe the region of deep inelastic scattering were conducted by a collab- 
oration between the Stanford Linear Accelerator group, the Massachusetts Institute of Technol- 
ogy, and the California Institute of Technology. The experiments used an electron linac capable 
of accelerating electrons up to 20 GeV and a momentum analysing spectrometer arm equipped 
with scintillator hodoscopes and multi-wire proportional chambers (MWPCs) for electron de- 
tection, triggering and background rejection. The experiments were performed in the period 
1970 to 1985 using one of three spectrometer arms selecting scattered electron momenta up 
to 1.6, 8, and 20 GeV. All three were mounted on a common pivot around the target area and 
able to measure different scattering angles. The spectrometers were designed to decouple the 
measurement of scattering angle of the electron and its momentum. This was achieved by care- 
ful design of the spectrometer optics in which dipole and focusing quadrupole magnets were 
used to deflect electrons in the vertical plane depending on momentum, and causing horizontal 
dispersion of the electrons depending on the scattering angle. 

The major experiments relevant to QCD analyses of proton structure are E49a/b, E61, E87, 
E89a/b, E139 and E140. In total some 6, 000 data points were measured for ep and ed scattering. 
The latter two high statistics experiments used the 8 GeV spectrometer, with a 30° vertical bend 
to deflect scattered electrons into the detector assembly region. 

Using improved methods of applying radiative corrections, and better knowledge of R [92], 
the SLAC data were re-evaluated with a more rigorous error treatment yielding smaller un- 
certainties for the relative normalisations between the individual experiments. A final sum- 
mary dataset of all SLAC experiments combined with precise determinations of Ff and F$ 
were published [93, 94]. This combined data set achieved a typical 3% statistical uncertainty, 
and similar systematic uncertainties. The measurements cover a region extending to high x, 
0.06 < x < 0.9, and 0.6 < Q 2 < 30 GeV 2 . Despite their very good precision, the measure- 
ments at highest x are usually not included in QCD analyses because higher twist effects are 
important in the domain where they were made (see section 3.2.1). 

2.2.2 BCDMS 

The BCDMS experiment [95] was a collaboration between the research institutes of Bologna, 
CERN, Dubna, Munich and Saclay, formed in 1978 and utilised the CERN SPS M2 muon beam 
with energies of 100, 120, 200, 280 GeV. The experiment was designed to enable precise mea- 
surements of R to be made with tight control of systematic uncertainties using the high intensity 
muon beam. The intense beam spills placed stringent requirements on the experimental trigger 
and background rejection abilities. The experiment collected high statistics data on proton and 
deuteron targets [96, 97]. The targets were located serially along the common axis of eight iron 
toroid modules, with each module consisting of scintillator hodoscopes and MWPCs. 

The primary measurements were the inclusive double differential cross sections corrected for 
radiative effects and presented as F 2 (x, Q 2 ). Measurements of the total inclusive cross section 
at different centre of mass energies allowed R to be determined. The data cover the region 
0.06 < x < 0.8 and 7 < Q 2 < 260 GeV 2 . The final measured values of F 2 have a typical 
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statistical precision of 1 — 2% and a similar systematic uncertainty, which at high x reaches up 
to 5% arising from the spectrometer field calibration and resolution. 

The BCDMS data provide a precise measurement of the F 2 structure function in the valence 
region of high x. The R data show similar x dependence of the two polarised pieces of the cross 
section at high x, but indicate an increasing longitudinal component at low x consistent with 
the expectation of an increasing gluon component of the proton. 

2.2.3 NMC 

The New Muon Collaboration (NMC) was a muon scattering DIS experiment at CERN that 
collected data from 1986 — 1989 using the M2 muon beam line from the CERN SPS. It was 
designed to measure structure function ratios with high precision. 

The experimental apparatus [98] consisted of an upstream beam momentum station and ho- 
doscopes, a downstream beam calibration spectrometer, a target region and a muon spectrom- 
eter. The muon beam ran at beam energies of 90, 120, 200, and 280 GeV. The muon beams 
illuminated two target cells containing liquid hydrogen and liquid deuterium placed in series 
along the beam axis. Since the spectrometer acceptance was very different for both targets 
they were regularly alternated. The muon spectrometer was surrounded by several MWPCs and 
drift chambers to allow a full reconstruction of the interaction vertex and the scattered muon 
trajectory. Muons were identified using drift chambers placed behind a thick iron absorber. 

The experiment published measurements of the proton and deuteron differential cross sections 
d 2 o jdxdQ 2 in the region 0.008 < x < 0.5 and 0.8 < Q 2 < 65 GeV 2 , from which the structure 
functions F% and F 2 d were extracted [58]. A statistical precision of 2% across a broad region of 
the accessible phase space was achieved, and a systematic precision of between 2 and 5%. NMC 
have also published direct measurements of R(x,Q 2 ) in the range 0.0045 < x < 0.11 [58] 
which provides input to the gluon momentum distribution. 

In addition the collaboration published precise measurements of the ratio F^j Ff [99] which is 
sensitive to the ratio of quark momentum densities d/u. By measuring the ratio of structure 
functions several sources of systematic uncertainty cancel including those arising from detector 
acceptance effects and normalisation. Thus measurements in regions of small detector accep- 
tance could be performed and these cover the region 0.001 < x < 0.8 and 0.1 < Q 2 < 145 
GeV 2 with a typical systematic uncertainty of better than 1%. The ratio F^jF^ was seen to 
decrease as x — > 1, indicating that d(x) falls more quickly than u(x) at high x; the behaviour of 
d/u as x approaches 1 remains however unclear. 

In 1992 NMC published the first data on the Gottfried sum rule [100] which in the simple quark 
parton model states that ^-F% — Fg = | dx(u v — d v ) + | dx(u — d) , and assuming 
u — d = 0, should take on a value of |. The initial NMC measurement indicated a violation of 
this assumption of a flavour symmetric sea. This was verified by the final NMC analysis [101] 
in which the Gottfried sum was determined to be 0.235 ± 0.026 at Q 2 = 4 GeV 2 , which implies 
that J dx(d — u) ~ 0.15, indicating a significant excess of d over u. 
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2.2.4 CCFR/NuTeV 



The Chicago-Columbia-Fermilab-Rochester detector (CCFR) was constructed at Fermilab to 
study DIS in neutrino induced lepton beams on an almost isoscalar iron target. The detector 
used the wide band mixed z/ M and beam reaching energies of up to 600 GeV. The CCFR 
experiment collected data in 1985 (experiment E744) and in 1987-88 (E770). 

In 1996 the NuTeV experiment (E815), using the same detector, was used in a high statistics 
neutrino run with the primary aim of making a precision measurement of sin 2 9w- The major 
difference between NuTeV and its predecessor CCFR was ability to select or beams which 
also limited the upper energy of the wide band beam to ~ 500 GeV. The neutrino beam was 
alternated every minute with calibration beams of electrons and hadrons throughout the one 
year data taking period. This allowed a precise calibration of the detector energy scales and 
response functions to be obtained. 

The neutrino beam was produced by protons interacting with a beryllium target. Secondary 
pions and kaons were sign selected and focused into a decay volume. The detector was placed 
1.4 km downstream of the target region and consisted of a calorimeter composed of square 
steel plates interspersed with drift chambers and liquid scintillator counters. A toroidal iron 
spectrometer downstream of the calorimeter provided the muon momentum measurement using 
a 1.5 T magnetic field. In total NuTeV logged 3 • 10 18 protons on target. 

Structure function measurements CCFR published measurements of F 2 and xF 3 [ 1 02] with 
a typical precision of 2 — 3% on F 2 which is largely dominated by the statistical uncertainty 
on the data. The data cover the region 0.015 < x < 0.65 and 1.26 < Q 2 < 126 GeV 2 . 
As discussed in 1.3, these measurements are a direct test of the total valence density. NuTeV 
measured the double differential cross sections d 2 a/dxdy from which the structure functions 
F 2 and xF 3 were determined [103-105] from linear fits to the neutrino and anti-neutrino cross 
section data. The data generally show good agreement between the two experiments and the 
earlier low statistics CDHSW experiment [106]. However, at x > 0.4 an increasing systematic 
discrepancy between CCFR and NuTeV was observed. A mis-calibration of the magnetic field 
map of the toroid in CCFR explains a large part of this discrepancy [103], and the NuTeV 
measurements are now believed to be more reliable. 

Semi-inclusive di-muon production In addition to providing inclusive cross section mea- 
surements, both experiments also measured the semi-inclusive production cross section, 
in v^— and v^— nucleon interactions [107]. Such di-muon events arise predominantly from 
charged-current interactions off a strange quark, with the outgoing charmed meson undergoing 
a semi-leptonic decay, as illustrated in Fig. 4. These measurements thus provide a direct con- 
straint on the strange quark density in the range 0.01 < x < 0.4. Moreover, the separation 
into 1/^ and cross sections allows a separation of the s and s contributions to be made, since 
di-muon events are mainly produced from W + s — >■ c — >■ /i + + X with an incoming beam, 
or from W~s — > c — > fi~ + X with a beam. These data favour a non- vanishing asymmetry 
s — s, as discussed further in section 3.5.5. 
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Figure 4: Exclusive di-muon production in deep inelastic scattering. 

2.2.5 E665 

This muon scattering experiment at Fermilab operated from 1987 — 1992 measuring deep inelas- 
tic scattering of muons off proton and deuteron targets in regularly alternating target cells [59]. 
The data cover the range 0.0009 < x < 0.4 and 0.2 < Q 2 < 64 GeV 2 . 

The experiment consisted of a beam spectrometer, target region and main spectrometer. The 
beam spectrometer was designed to detect and reconstruct the beam muon momentum using 
trigger hodoscopes, multi-wire proportional chambers and a dipole magnet. The target region 
consisted of cells filled with liquid hydrogen and liquid deuterium placed in a field free region 
and which were alternated regularly. The main spectrometer was located immediately down- 
stream of the target region and consisted of two large dipole magnets with reversed polarity. 
A series of drift and multi-wire proportional chambers placed inside and downstream of both 
magnets provided comprehensive tracking coverage. Further downstream a lead-gas sampling 
electromagnetic calorimeter was placed in front of iron absorbers followed by the muon detec- 
tors consisting of planes of proportional tubes and trigger hodoscopes. 

The measurements of F£ and typically have statistical uncertainties of 6% and 5% respec- 
tively and systematic uncertainties of better than 4% . The E665 jip and fid data partially overlap 
with measurements from NMC at higher Q 2 . The x range of E665 data overlaps with that cov- 
ered by the HERA experiments HI and ZEUS (see section 2.3) though these data on lie 
at higher values of Q 2 . Comparisons between the experiments show good agreement between 
NMC and E665, and the HERA data show a smooth continuous evolution for fixed x with 
increasing Q 2 . 

2.2.6 CHORUS 

The CERN Hybrid Oscillation Research Apparatus (CHORUS) [ 1 ( 18] was originally a ->■ v T 
appearance experiment in operation at CERN from 1994-1997 [109]. In 1998 the run was 
exclusively used for differential measurements of neutrino induced CC DIS using the lead- 
scintillator calorimeter as an active target [110] as well as studying the Z/A dependence of 
the total CC cross section [111]. The experiment utilised the 450 GeV proton beam from the 
SPS which was directed to a target producing charged particles. These were sign selected and 
focused into a decay volume followed by iron and earth to filter out the neutrinos which emerged 
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with a wide energy range 10 < E u < 200 GeV. The detector consisted of a lead-fibre scintillator 
calorimeter with nine planes of modules with alternating orientation in the plane transverse to 
the beam. The muon spectrometer was made of six toroidal iron magnets interspersed with drift 
chambers scintillators and streamer tubes to reconstruct the muon momentum. 

The differential cross sections in x, y and E v are measured in the range 0.02 < x < 0.65 and in 
0.3 < Q 2 < 82 GeV 2 . These are used to extract the structure functions F 2 and xF 3 in a linear 
fit to the y dependence of the cross sections for each x, Q 2 bin. The statistical uncertainty on 
F 2 is in the region of 1% and the systematic contribution to the uncertainty is typically below 
3% for x > 0.1 and increases at lower x. The data for xF 3 are in agreement with earlier 
measurements from CCFR [112] and the hydrogen target neutrino experiment CDHSW [106]. 
The F 2 measurements are in better agreement with those from CCFR than with NuTeV. 

2.3 The HI and ZEUS experiments 

The HERA collider was the first colliding beam ep accelerator operating at centre-of-mass 
energies of 301 and later at 319 GeV. At the end of the operating cycle two short low energy runs 
at yfs = 225 and 250 GeV were taken for a dedicated Fl measurement. At the highest centre of 
mass energy the beams had energies of 920 GeV for the protons and 27.6 GeV for the electrons. 
The two experiments utilising both HERA beams were HI and ZEUS and they provide the bulk 
of the precision DIS structure function data over a wide kinematic region. In particular, HERA 
opened up the domain of x below a few 10~ 3 which had been mostly unexplored by the fixed 
target experiments. The fixed target experiments HERMES and HERA-B will not be discussed 
in this article. 

The accelerator operation is divided into three periods or datasets: HERA-I from 1992 to 2000, 
HERA-II from 2003 to 2007, and the dedicated Low Energy Runs taken in 2007 after which 
the accelerator was decommissioned. During the 2001-2003 upgrade of the accelerator and 
the experiments, spin rotators were installed in the lepton beam line allowing longitudinally 
polarised lepton beam data to be collected, with a polarisation of up to ±40%. In total HI 
and ZEUS together collected almost 1 fb -1 of data evenly split between lepton charges and 
polarisations. 

A review of the physics results of the HI and ZEUS experiments can be found in [11 3], and the 
HERA structure function results have been recently reviewed in [1 14]. 

2.3.1 Experimental Apparatus 

The two experiments were designed as general purpose detectors, nearly An hermetic, to analyse 
the full range of ep physics with well controlled systematic uncertainties. The highly boosted 
proton beam led to asymmetric detector designs with more hadronic instrumentation in the 
forward (proton) direction which had to withstand high rates and high occupancies. 

The most significant differences between them are the calorimeters, which had an inner elec- 
tromagnetic section and an outer hadronic part. ZEUS employed a compensating Uranium 
scintillator calorimeter located outside the solenoidal magnet providing a homogeneous field 
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Figure 5: Left: the spread of the theoretical predictions for F 2 which were consistent with pre- 
HERA data for Q 2 = 15 GeV 2 . Centre: The first F 2 measurements from HI in 1992. Right: 
The complete HERA-I measurements of F 2 . 

of 1.4 T. HI used a lead/steel liquid argon sampling calorimeter located in a cryostat within 
the solenoid field of 1.16 T and a lead/scintillating fibre backward electromagnetic calorimeter 
for detection of scattered leptons in neutral current processes. In both HI and ZEUS, a muon 
detector was surrounding the calorimeter. 

Both experiments utilised drift chambers in the central regions for charged particle detection 
and momentum measurements which were enhanced by the installation of precision silicon 
trackers. They allowed the momenta and polar angles 9 of charged particles to be measured in 
the range of 7° < 9 < 165°, the backward region of large 9 being where the scattered electron 
was detected in low Q 2 NC DIS events. In HI an additional drift chamber gave access to larger 
angles of up to ~ 172°. 



2.3.2 Neutral Current measurements from HI and ZEUS 

Figure 5 (left) shows the spread in parameterisations of F 2 which existed prior to the first HERA 
data. Most extrapolations from pre-HERA data indicated a "flattish" F 2 at low x - which was 
also expected from Regge-like arguments. The first HERA results [115, 116] presented in 1993 
were based on 30 nb _1 of data taken in 1992 and showed a surprising, strong rise of F 2 towards 
low x. An example [115] of these measurements is shown in Fig. 5 (centre). With the full 
HERA-I dataset, the statistical uncertainty of these low x and low Q 2 measurements could be 
reduced below 1%, with a systematic error of about 2%; the measurements are shown in Fig. 5 
(right). 

With increasing luminosity, high statistics were accumulated over the whole kinematic do- 
main [61]. Fig. 15 in the summary section 2.5 shows an overview of HERA F 2 measure- 
ments together with data points from fixed target experiments. The very strong scaling viola- 
tions are clearly observed at low x. This indicates a large gluon density since at leading order 
dF 2 /d\rt Q 2 is driven at low x by the product of a s and the gluon density Q 2 ) (see Eq. 25). 
At high x the scaling violations are negative: high x quarks split into a gluon and a lower x 
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HERA Neutral Current at high x 
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Figure 6: Left: The JVC DIS cross section as a function of Q 2 for several values of x, measured 
in e + p (blue symbols) and in e~p (red symbols) collisions. The Standard Model predictions 
are overlaid, as the full and dashed curves, respectively. Right: The structure function xF% z 
extracted from the complete HI HERA-I+II dataset. From [75, 122]. 



quark. The curves overlaid are the result of QCD fits (see section 3) based on the DGLAP 
evolution equations. The data show an excellent agreement with DGLAP predictions, over five 
orders of magnitude in Q 2 and four orders of magnitude in x. 

At very high Q 2 , the NC cross sections are sensitive to the Z-exchange, resulting in <Jnc{^ P) 7^ 
PNc(e + p) as was seen in Sec. 1.3. The NC cross sections have been measured at high Q 2 both 
in e +r p and in e~p collisions [75, 1 17-122], as shown in Fig. 6. The contribution of Z exchange 
is clearly visible for Q 2 above about 10 3 GeV 2 , with the 7 — Z interference being constructive 
(destructive) in e~p ( e + p) collisions. The difference between both measurements gives access 
to the structure function xF 3 which is a direct measure of the valence quark distributions (see 
Eqs. 7 and 11). The HI measurement using the full HERA-II luminosity [122] is shown in 
Fig. 6 (right). 

HERA collider operation concluded with data taking runs at two reduced proton beam energies 
in order to facilitate a direct measurement of Fl- This structure function gives a larger con- 
tribution to the cross section with increasing y (see Eq. 7). It can therefore be determined by 
measuring the differential cross section at different y/s, i.e. at the same x and Q 2 but different 
y. Measurements from HI and ZEUS have been published [123, 124] covering the low x region 
of 3 x 1(T 5 - 1(T 2 and Q 2 from 1.5 - 120 GeV 2 . 



2.3.3 Charged Current DIS measurements 

Measurements of charged current DIS provide important constraints on the flavour separation, 
which are missing from the measurement of F 2 alone, as the latter mostly constrains one single 
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Figure 7: The CC DIS cross section measured in e + p collisions. The overlaid curves show how 
these measurements disentangle the contributions from up and down quarks. From [126]. 



combination of PDFs (AU+D). Indeed (see Eq. 12andEq. 13-14), a^ c goes as (1— y) 2 xD+xlI 
and probes mainly the u density, while a^ c goes as (1 — y) 2 xD + xU and probes mainly the 
d density, with some constraints being also set on U via the high y measurements. An example 
of CC measurements is shown in Fig. 7. Although the statistical precision of the HERA-II CC 
measurements [122, 125, 126] is much better than what was achieved with HERA-I [127, 128], 
these measurements remain statistically limited. For example, the precision reaches ~ 10% for 
x ~ 0.1. Despite this moderate precision, the constraints brought by CC DIS at HERA are 
interesting since the experimental input is completely free of any correction, in contrast to those 
obtained by comparing DIS measurements on a proton and a deuterium target. 



2.3.4 The averaged HI and ZEUS DIS dataset 

Recently the two collaborations have embarked on a programme of data combination leading to 
joint publications of combined data which profit from improved uncertainties over the individ- 
ual measurements. A novel, model independent, statistical method has been employed, which 
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was introduced in [129] and further refined in [130]. By taking into account the variations of 
the measurements arising from different experimental sources of uncertainty an improvement 
in the statistical and systematic uncertainties is obtained. This arises from the fact that each 
experiment uses different methods of measurement and each method can act as a calibration of 
the other. 

The unique assumption of the averaging method is that both experiments measure the same 
quantity or cross section at a given x and Q 2 . The averaging procedure is based on the min- 
imisation of a x 2 function with respect to the physical cross sections in all (x, Q 2 ) bins of the 
measurement. Each experimental systematic error source is assigned a nuisance parameter with 
a corresponding penalty term in the x 2 function to restrict large deviations of the parameter 
from zero. These parameters induce coherent shifts of the measured cross sections according 
to the correlated systematic uncertainties provided by the experiments. The distribution of the 
fitted nuisance parameters in an ideal case should be Gaussian distributed with a mean of zero 
and variance of one. 

Several types of cross section measurement can be combined simultaneously e.g. NC e + p, NC 
e~p, CC e + p and CC e~p, yielding four independent datasets all of which benefit from a reduc- 
tion in the uncertainty. In this case the reduction arises from correlated sources of uncertainty 
common to all cross section types. This data combination method has been described in detail 
and used in several publications [60, 61, 130]. 

This procedure also has the advantage of producing a single set of combined data for each cross 
section type which makes analysis of the data in QCD fits practically much easier to handle. The 
first such combination of HI and ZEUS inclusive neutral and charged current cross sections has 
been published using HERA-I data [61]. Further combination updates are expected to follow as 
final cross sections using HERA-II data are published by the individual experiments. 

As an example Fig. 8 shows the neutral current cross section for unpolarised e + p scattering. The 
combined data are shown compared to the individual HI and ZEUS measurements. The overall 
measurement uncertainties are reduced at high x mainly from improved statistical uncertainties. 
However at low x where the data precision is largely limited by systematic uncertainties, a clear 
improvement is also visible. In the region of Q 2 ~ 30 GeV 2 the overall precision on the 
combined NC cross sections has reached 1.1% [61]. In the CC e + p channel the measurement 
accuracy is limited by the statistical sample sizes and the combined data reduces the uncertainty 
to about 10% for x ~ 0.1. A further significant reduction in uncertainty is expected once the 
combination of HI and ZEUS data including the complete HERA-II datasets is available. 

The combined HERA datasets have been used in QCD analyses [61] to determine proton PDFs 
with HERA data alone. This is described in more detail in section 3.4.1. 

2.3.5 Heavy flavour measurements: F 2 CC and F% b 

The charm and beauty contents of the proton have been measured at HERA via exclusive mea- 
surements (exploiting for example the D* — > D°7r s \ ow — > Kixix decay chain, or the b — > /jX 
decays, see e.g. [131-134]), and via semi-inclusive measurements which exploit the long life- 
time of the charmed and beauty hadrons, using silicon vertex devices around the interaction 
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Figure 8: HERA combined NC e + p reduced cross section as a function of Q 2 for six x—bins, 
compared to the separate HI and ZEUS data input to the averaging procedure. The individual 
measurements are displaced horizontally for better visibility. From [61]. 

points [135, 136]. Figure 9 (left) shows the F 2 6b measured by the HI experiment [135]. Just 
as for the inclusive F 2 , it shows large scaling violations at low x. In Fig. 9 (right), the charm 
fraction in the proton is shown to be about 20% independently of Q 2 , while the beauty fraction 
increases rapidly with Q 2 , reaching ~ 1% at high Q 2 . The precision of the measurements of F 2 CC 
and F 2 6fe is about ~ 15% and ~ 30%, respectively. These measurements provide an important 
test of the theoretical schemes within which observables involving heavy flavours are calculated 
(see section 3.2.2). 

2.3.6 Dedicated measurements at very low Q 2 

Extending the F 2 measurements down to very low Q 2 requires dedicated techniques or detec- 
tors. The squared momentum transfer Q 2 can be written as Q 2 = 2E®E e (l + cos6> e ), where 
E® denotes the energy of the incoming lepton in the laboratory frame, E e that of the scattered 
lepton, and 9 e is the angle of the scattered lepton with respect to the direction of the incoming 
proton. Thus it can be seen by inspection that to go down to low Q 2 , one needs to access larger 
angles 9 e , or to lower the incoming electron energy . This can be achieved by: 

• using a dedicated apparatus, as the ZEUS Beam Pipe Tracker (BPT), which consisted of a 
silicon strip tracking detector and an electromagnetic calorimeter very close to the beam 
pipe in the backward electron direction; 
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Figure 9: Left: measured by the HI experiment using a lifetime technique. Right: the 
fractions of charm and beauty in the proton, derived from the same analysis. From [135]. 



• shifting the interaction vertex in the forward direction. Two short runs were taken with 
such a setting, where the nominal interaction point was shifted by 70 cm; 

• exploiting QED Compton events: when the lepton is scattered at a large angle 9 e , it may 
still lead to an observable electron (i.e. within the detector acceptance) if it radiates a 
photon. 

• using events with initial state photon radiation which can lower the incoming electron 
energy E® — »■ E® — E 1 where E^ is the energy of the radiated photon. 



All these methods have been exploited at HERA [137]. In particular, it was observed that F 2 
continues to rise at low x, even at the lowest Q 2 , Q 2 ~ 0.5 GeV 2 . Note that these measurements 
are usually not included in QCD analyses determining parton distribution functions, since they 
fail the lower cut in Q 2 that is usually applied to DIS measurements, to ensure that they are not 
affected by non-perturbative effects. 
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Figure 10: Differential cross sections of inclusive jet production in DIS measured by the ZEUS 
experiment, together with their systematic uncertainties. From [142]. 



2.3.7 Jet cross sections in DIS at HERA 



The HI and ZEUS experiments have measured inclusive jet cross sections in the so-called 
"Breit" frame as a function of several variables, for example differentially with respect to the 
jet energy and in several Q 2 bins [138-142]. The Breit frame [143] is of particular interest for 
jet measurements at HERA since it provides a maximal separation between the products of the 
beam fragmentation and the hard jets. In this frame, the exchanged virtual boson V* is purely 
space-like and is collinear with the incoming parton, with q = (0, 0, —Q). For parton-model 
processes, V*q — > q, the virtual boson is absorbed by the struck quark, which is back-scattered 
with zero transverse momentum with respect to the V* direction. On the other hand, for 0(ot s ) 
processes like QCD-Compton scattering (V*q — > qg) and boson-gluon fusion (V*g — > qq), 
the two final-state partons lead to jets with a non- vanishing transverse momentum. Hence, the 
inclusive requirement of at least one jet in the Breit frame with a high transverse momentum 
selects 0(a s ) processes and suppresses parton-model processes. Example measurements of 
inclusive jet production in DIS obtained by the ZEUS experiment are shown in Fig. 10. With 
small systematic uncertainties of typically ~ 5%, such data can bring constraints on the gluon 
density in the medium x range, x = 0.01 — 0.1. However, when included in global QCD fits 
which also include other jet data, the impact of these measurements is limited [38]. 

Jet production has also been measured in the photoproduction regime of Q 2 — > 0. However, 
these measurements are usually not included in QCD analyses of the proton structure because 
of their potential sensitivity to the photon parton densities. 
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2.4 Experiments with hadronic beams 



In interactions where no lepton is involved in the initial state, the cross sections depend on 
products of parton distribution functions as shown by Eq. 6. Hadro-production experiments, 
using either a fixed target or two colliding beams, provide a wealth of measurements that nicely 
complement those made in lepto-production. In particular, they set specific constraints on some 
parton distribution functions that are not directly accessed in DIS experiments. The correspond- 
ing measurements, performed by fixed target experiments and by the DO and CDF experiments 
at the Tevatron collider, are described in this section. 



2.4.1 Kinematics in hadro-production 

In pp, pd or pp collisions, the production of a final state of invariant mass M involves two 
partons with Bjorken-a; values x\ and x 2 related by 

M 2 = x x x 2 s (30) 

where s denotes the square of the energy in the centre of mass of the hadronic collision. The 
minimum value of xi t2 is thus x mi „ = r with 

r = M 2 /s (31) 

In the rest frame of the two hadrons and neglecting the hadron masses, the rapidity y of the final 
state X is 

1 E-p z 1 x x 

V=o ln -77- — = o m — (32) 

2 E + p z 2 x 2 

where the hadron that leads to the parton with momentum fraction X\ defines the positive direc- 
tion along the beam(s) axis. Hence x\ and x 2 can be written as 

xi = \fre v , x 2 = \fre~ v (33) 

In fixed target experiments, the positive direction is usually defined by the direction of the 
incident beam, such that x\ denotes the Bjorken-a; of the parton in the beam hadron, and x 2 
that of the parton in the target hadron. In pp collisions, the positive direction can be defined 
by the proton beam, in which case x\ (x 2 ) denotes the Bjorken-x of the parton in the proton 
(anti-proton). 



2.4.2 Drell-Yan di-muon production in fixed target experiments 

The experiments E605, E772 and E866/NuSea have measured di-muon production in Drell- 
Yan interactions of a proton off a fixed target. They used an 800 GeV proton beam extracted 
from the Fermilab Tevatron accelerator that was transported to the east beamline of the Meson 
experimental hall. While changes were made to the spectrometer for E772 and E866/NuSea, 
the basic design has remained the same since the spectrometer was first used for E605 in the 
early 1980s. The core consists of three large dipole magnets that allow the momentum of 
energetic muons to be measured and deflect soft particles out of the acceptance. Different 
targets have been employed: copper for E605, liquid deuterium for E772 and E866, and liquid 
hydrogen for E866. The centre-of-mass energy of the Drell-Yan process for these experiments 
is yfs = 38 GeV. A broad range of di-lepton invariant mass M could be covered, extending up 
to M ~ 20 GeV. 
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Differential cross sections The experiments published [144, 145] double-differential cross 
sections in M and either in the rapidity of the di-lepton pair y, or in Feynman xp, defined as 
xp = 2q z / y/s where q denotes the four-vector of the Drell-Yan pair in the hadronic centre-of- 
mass frame and q z its projection on the longitudinal axis. At leading order, xp = X\ — x 2 and 
the leading order differential cross sections can be written as: 

d 2 a Ana 2 v , _ . . , 

e i [Qi{xi)Qi{x2) + qi{xi)qi{x2)\ (34) 



dM 2 dy 9M 2 s 
d 2 a 1 da 



dM 2 dxp x\ + X2 dM 2 dy 



(35) 



The experiments made measurements in the range 4.5 < M < 14 GeV and 0.02 < xp < 0.75, 
corresponding to X\ ~ 0.1 — 0.8 and x 2 ~ 0.01 — 0.3, the acceptance of the detector being larger 
for x\ ^> x 2 . In this domain, the first term dominates in Eq. 34, and the measurements bring 
important information on the sea densities u(x) and d(x), especially for x larger than about 0.1 
where DIS experiments poorly constrain the sea densities. 



The ratio pp/pd from E866 E866/NuSea made measurements using both a deuterium and 
a hydrogen target [146-148] from which ratios of the differential cross sections a pp /a p d could 
be extracted. These measurements have brought an important insight on the asymmetry d — u 
at low x. Indeed, the cross sections in the phase space where x\ 3> x 2 can be written as: 



a pp oc -u(x 1 )u(x 2 ) + -d(x 1 )d(x 2 ), 
y y 

4 -, 1 
a pn oc -u(x 1 )d(x 2 ) + -d(xi)u(x 2 ) 



9 



9 



(36) 
(37) 



such that: 



2a~ 



1 d(xi) 
4 u(xi) 



PP 



2l 



1 d{x{) d(x 2 ) 
4 u{xi) u(x 2 ) 



d(x 2 ) 
u(x 2 ) 



In the relevant domain of x\, the ratio d{x\)/u{x\) is quite well known, such that the ratio 
a p d/2a pp gives access to the ratio d/u at low and medium x, x ~ 0.01 — 0.3. 

This idea was first used by the NA51 experiment at CERN [149], which confirmed the indi- 
cation, previously obtained by the NMC experiment, that d ^ u (see section 2.2.3). But the 
acceptance of the NA51 detector was limited, and their result for d/u (d/u ~ 2) could be given 
for a single x value only. 

The E866 experiment was the first to measure the x-dependence of d/u. Fig. 1 1 shows the ob- 
tained measurement [ 47], which extends down to x 2 ~ 0.03. Note the spread of the theoretical 
predictions before these data were included in the fits. The ratio d/u as extracted by E866 is 
shown in Fig. 1 1 (right), and clearly demonstrates that d > u. The asymmetry between J and u 
is largest for x ~ 0.2 and decreases with decreasing x; what happens as x — > 1 remains unclear. 
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Figure 11: Left: The ratio a p d/ 2a pp measured by the E866 experiment. Right: The ratio d/u 
extracted from this measurement. The previous result from the NA51 experiment is also shown 
as the open square. From [147]. 



2.4.3 The DO and CDF experiments at Fermilab 



The DO and CDF experiments were located at the Tevatron collider at Fermilab, which collided 
protons and anti-protons. In a first phase of operation ("Run I", from 1992 to 1998), the Tevatron 
was operated at a centre-of-mass energy of 1.8 TeV. The second phase, "Run II", started in 2001 
following significant upgrades of the accelerator complex and of the experiments, with a centre- 
of-mass energy of 1.96 TeV. The data taking has stopped in 2011. 

The measurements of the DO and CDF experiments provide several important constraints on 
proton structure: 



• measurements of lepton charge asymmetry from W decays bring constraints on the ratio 
d/u at x > 0.05, and hence on the d density, which is less well known than the u density; 

• measurements of the Z rapidity distribution in Z — > l + l~ decays bring constraints on 
the quark densities at x > 0.05, which are complementary to those obtained from DIS 
measurements; 

• the cross sections for inclusive jet production in several rapidity bins provide constraints 
on the gluon and the quark distributions for 0.01 < x < 0.5. In particular, they set the 
strongest constraints on the gluon density at high x. 



A detailed description of the DO detector can be found in [150]. The inner most part is a central 
tracking system surrounded by a 2 T superconducting solenoidal magnet. The two components 
of the central tracking system, a silicon microstrip tracker and a central fibre tracker, are used 
to reconstruct interaction vertices and provide the measurement of the momentum of charged 
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particles in the pseudo-rapidity range | rj |< 2. The tracking system and magnet are followed 
by the calorimetry system that consists of electromagnetic and hadronic uranium-liquid argon 
sampling calorimeters. Outside of the DO calorimeter lies a muon system which consists of 
layers of drift tubes and scintillation counters and a 1.8 T toroidal magnet. 

The CDF II detector is described in detail in [ 1 5 1 ] . The detector has a charged particle tracking 
system that is immersed in a 1.4 T solenoidal magnetic field coaxial with the beam line, and 
provides coverage in the pseudo-rapidity range | 77 |< 2. Segmented sampling calorimeters, 
arranged in a projective tower geometry, surround the tracking system and measure the energy 
of interacting particles for | 77 |< 3.6. 



Lepton charge asymmetry from W decays In pp or pp collisions, the production of W + 
bosons proceeds mainly via ud interactions, or via ud for W~ production. At large boson 
rapidity y w , the interaction involves one parton with x = i/^exp \y w \ (see Eq. 33) where 
yfr = Mw/y/s ~ 0.04. In pp collisions at the Tevatron, this medium to high x parton is most 
likely to be a u quark picked up from the proton in the case of W + production, or a u anti-quark 
from the anti-proton in W~ production; this follows from the fact that u(x) > d{x) at medium 
and high x. Hence, W + bosons are preferably emitted in the direction of the incoming proton 
and W~ bosons in the anti-proton direction, leading to an asymmetry between the rapidity 
distributions of W + and W~ bosons. This asymmetry can be written as: 

M v da(W + )/dy w - da(W~)/dy w 
[VW) da{W+)/dy w + da(W-)/dy w { j 

^ u(x 1 )d(x 2 ) - d(x 1 )u(x 2 ) 
u(xi)d(x 2 ) + d(xi)u(x 2 ) 
R(x 2 )-R( Xl ) 



R(x 2 ) + R(xi) 



ywV^R'iV^/RiV^ (40) 



where R(x) = d(x)/u(x) and R' denotes the derivative of R. It can be seen from Eq. 40 that 
the W charge asymmetry is directly sensitive to the d/u ratio in the range x ~ 0.01 — 0.3, and 
to its slope at x ~ 0.04. 

This asymmetry remains, though diluted, when measuring the experimentally observable 8 ra- 
pidity of the charged lepton coming from the W decay [153-156]. Figure 12 shows example 
measurements from the CDF experiment, in two bins of the transverse energy E T of the lepton. 
At low E T , the measured asymmetry is also sensitive to the anti-quark densities, via subleading 
interactions involving an anti-quark coming from the proton and a quark from the anti-proton, 
which were neglected in the approximate formula Eq. 40. 



Rapidity distribution in Z — > l + l~ events The large integrated luminosity delivered by the 
Tevatron allows the Z rapidity distribution to be precisely measured by the DO and CDF exper- 
iments [157, 158]. The Z/Y rapidity distribution is measured in a di-lepton mass range around 
the Z boson mass, extending up to | y |~ 3. The measurements provide constraints on the 
quark densities at Q 2 ~ Mf , over a broad range in x. Neglecting the Z/Y interference terms, 

8 A measurement of A(yw) was actually performed in [152]. 



32 



CDF-II, 170 pb 

~E T > 25 GeV 




— CTEQ6.1M 
■■■ MRST02 

NLO RESBOS (F. Landry, et al. Phys.Rev.D67:07301 6.2003) 



0.5 



1.5 



■o 

CD 
O 
£ 
O 
O 




— CTEQ6.1 M 

— MRST02 

NLO RESBOS (F. Landry, et al. Phys.Rev.D67:073016,2003) 



Mel 



Figure 12: Lepton charge asymmetry from W decays, as a function of the pseudo-rapidity of 
the charged lepton. The asymmetry is shown for two ranges of the lepton transverse energy. 
From [154]. 



which are small in the considered mass range and well below the experimental uncertainties, 
the differential cross section reads as: 



da 
dy 



71 



G F MlV2 



3s 



^ Ci [qi{xi)qi(x 2 ) + qi{xi)q(x 2 )} 



(41) 



where q = vf + of is the sum of the squares of the vector and axial couplings of the quarks to 
the Z boson. Hence, the measured cross sections mainly probe the combination 

~ 0.29n(xi)u(x 2 ) + 0.37d(xi)d(x2) with 



(42) 



complementary to the combination ~ Auu + dd probed by pp Drell-Yan production in fixed 
target experiments (see section 2.4.2 and Eq. 36). These DO and CDF measurements bring 
interesting constraints on the d distribution and, in the forward region, on the quark densities at 
high x. 



Inclusive jet cross sections The Tevatron measurement of the jet production cross section 
with respect to the jet transverse momentum pt, d 2 a/dprdy, in several bins of jet rapidity y, 
provide constraints on the quark and gluon densities for x larger than a few 10~ 2 . For example, 
in the central rapidity region, the production of jets with p T = 200 GeV involves partons 
with x ~ 0.2, and at least one of them is a gluon in ~ 70% of interactions. Hence, these 
measurements provide crucial constraints on the gluon density at high x. 

The jet measurements from Run I [159, 160] preferred a rather high gluon density at high x, 
in some tension with the other experimental measurements available at that time, as discussed 
in section 3.5.4. As will be seen in section 3, this tension is much reduced with the Run II 
measurements [161-164]. An example of these measurements, from the DO collaboration, is 
shown in Fig. 13. These measurements are presented in six bins of jet rapidity extending out 
to | y \— 2.4. The cross section extends over more than eight orders of magnitude from p T = 
50 GeV to pt > 600 GeV. Compared to previous Run I results, the systematic uncertainties 
have been reduced by up to a factor of two, to typically ~ 10 — 15%. This has been made 
possible by extensive studies of the jet response, which lead to a relative uncertainty of the jet 
p T calibration of about 1% for jets measured in the central calorimeter, for p T in 150 — 500 GeV. 
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P T (GeV) 

Figure 13: Inclusive jet cross section as a function of jetpr, as measured by the DO experiment. 
From [162]. 

2.4.4 Prompt photon production 

In hadronic interactions, the production of prompt photons, i.e. photons that do not arise from 
the decay of a hadron produced in the interaction, is sensitive to the gluon density via the QCD 
Compton process qg — > ^q. However, measurements of inclusive prompt photon production 
performed at low energy (y/s = 20 — 40 GeV) by the fixed target E706 experiment [165-167] 
could not easily be included in QCD analyses of proton PDFs, as they were systematically 
higher than theoretical predictions [168, 169]. Consequently, once precise jet measurements 
from the Tevatron experiments became available, they were used instead of the prompt photon 
data to constrain the gluon density at medium and high x, and the usage of prompt photon 
measurements in QCD fits was abandoned. 

Since then, the compatibility of prompt photon measurements with pQCD predictions has been 
discussed at length, and the current status is reviewed in [170]. With respect to older mea- 
surements, measurements performed in hadronic collisions, at higher y/s, are less affected by 
non-perturbative effects such as intrinsic k T broadening [166, 171]. Moreover, the requirement 
that the photon be isolated reduces the contribution of photons coming from fragmentation 
processes. These fragmentation photons are less well understood and are subject to large un- 
certainties. The analysis performed in [170] considered the measurements of isolated prompt 
photon production carried out by 9 : 

• the Tevatron experiments, at y/s = 0.63 — 1.96 TeV; photons with a transverse energy 
between 7 and 400 GeV were measured, corresponding to a range in x of 10~ 3 to 0.4; 

• the UA1 experiment at the CERN SppS, at y/s = 0.55 - 0.63 TeV; photons of 12 to 100 
GeV were measured, covering a range of 0.01 to 0.5 in x; 

9 Recent measurements made by the ATLAS and CMS experiments at the LHC were also studied in [170] and 
are described in section 4.3.5. 
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• the PHENIX experiment at the RHIC collider at the Brookhaven National Laboratory. 
RHIC is the only collider than can collide polarised protons, as well as several species 
of heavy ions such as gold or uranium. In [172], unpolarised pp cross sections were 
reported at yfs = 200 GeV, by averaging over the spin states of the proton beams, for the 
production of photons of 3 to 16 GeV, corresponding to a range in x of 0.03 to 0.2. 

Generally a good agreement with NLO pQCD predictions has been found. However, these data 
are not included in the QCD fits described in section 3.5. 

2.5 Summary 

The complete range of measurements used in proton structure determinations now span six 
orders of magnitude in both x and Q 2 and is shown in Fig. 14. The region of high x and low Q 2 
is covered by the fixed target data with charged lepton and neutrino DIS experiments as well as 
proton beams on nuclear targets. The large yfs of the HERA collider provides access to a wide 
kinematic range. Finally the range of the Tevatron pp collider data operating at yfs = 1.96 TeV 
is shown providing access to Q 2 ~ 10 5 GeV 2 through the inclusive jet measurements. A 
summary of the main features of the experimental measurements is provided in Tab. 1 . 

The PDF flavour decomposition of the proton is achieved by combining data from different 
types of experiment each of which brings its own constraints and are summarised in Tab. 2. The 
fixed target charged lepton DIS data provide stringent constraints on the light quark PDFs in the 
valence region of x > 0.1 as well as medium x ~ 0.01. The combination of F 2 measurements 
in NC DIS off a proton and a deuterium target provides the primary U -type and D-type flavour 
separation. For example the F^/F^ ratio of DIS inclusive measurements from NMC constrains 
the ratio of u/d in a rather wide x domain. Measurements of Drell- Yan di-muon production data 
are sensitive to the combination qq which at high x constrains the anti-quarks. Neutrino induced 
DIS structure functions from CCFR, NuTeV and CHORUS disentangle the contribution of sea 
quarks from that of valence quarks, through the measurements of the CC structure functions W 2 
and xW$ at x > 0.01. The di-muon production measurements allow the s and s components to 
be ascertained via the process vN — > + X (and the charge conjugate reaction) mediated 

by W + s and W + s fusion. 

The fixed target data generally benefit from large event rates in the high x region but can be 
complicated by the use of nuclear targets. The HERA DIS data are not affected by these issues 
and the NC measurements place tight constraints on the low x gluon distribution as well as the 
sea quark PDFs at low x. The CC measurements have moderate statistical precision but provide 
sufficient flavour separation to allow PDF extractions using only HERA data. 

The existing measurements of F 2 are summarised in Fig. 15; it demonstrates the scaling vio- 
lations which are most prominent at low x. The charged lepton xF 3 data are consistent albeit 
within large uncertainties as shown in Fig. 16. The neutrino induced DIS measurements also 
shown have much greater precision. Direct measurements of the F L structure function are also 
shown in Fig. 16. The precise fixed target data at high x are compared to the HERA measure- 
ments lying at much lower x with much larger uncertainties. There is good overall consistency 
of the variety of measurements from different experiments. The compatibility between the var- 
ious experimental datasets is further discussed in section 3.5.6. 
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Combinations of the experimental observables listed here allow flavour separated PDFs of the 
proton to be extracted in pQCD analyses, as discussed in chapter 3. 




Figure 14: The kinematic plane in (x,Q 2 ) accessed by the DIS and hadron collider experiments. 
From [73]. 
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Table 1 : Table of datasets generally used in current QCD fits. The kinematic range of each 
measurement in x and Q 2 and the incident beam energy are also given. The normalisation 
uncertainties of the charged lepton scattering experiments are typically 2 — 3%. 
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Table 2: The main processes included in current global PDF analyses ordered in three groups: 
fixed target experiments, HERA and the Tevatron. For each process an indication of their dom- 
inant partonic subprocesses, the primary partons which are probed and the approximate range 
of x constrained by the data is given. From [1 73]. 
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Figure 15: The proton structure function F 2 (x,Q 2 ) measured in a wide kinematic range by 
various DIS experiments. From [73]. 
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Figure 16: The proton structure functions xF 3 (x, Q 2 ) and F L (x,Q 2 ). From [73]. 
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3 Determination of parton distribution functions 



3.1 Introduction and generalities 

Parton distribution functions are determined from fits of perturbative QCD calculations, based 
on the DGLAP evolution equations, to various sets of experimental data. These fits are regularly 
updated to account for new experimental input and theoretical developments. Most fits are 
performed at NLO, although leading order fits are still of interest, for example for Monte-Carlo 
simulations. With the recent, full calculation of the DIS cross section at NNLO [55], first NNLO 
fits are becoming available (see section 3.6). 

In the following, we will discuss in particular the latest NLO fits performed by the CTEQ/CT 
group (CTEQ6.6 [174] and CT10 [175]), the MSTW group (MSTW08 [173]) and the NNPDF 
collaboration (NNPDF2.0 [176] and NNPDF2.1 [ 77]), which try to include all relevant exper- 
imental data. Other groups, e.g. GJR [178] or ABKM/ABM [81, 179, 180], also provide fits 
using a subset of data. The HI and ZEUS collaborations have also published QCD fits based on 
their inclusive DIS data only (see e.g. HERAPDF1 .0 [ I ]). The main differences between these 
various fits are described in the next sections. These fits are publicly available via the LHAPDF 
interface [181], which also provides access to older proton PDF fits as well as to photon and 
pion PDFs. 

The general ansatz used in QCD fits is the parameterisation of parton distributions at a so-called 
starting scale Ql, using a flexible analytic form. For example, one can choose to parameterise 
the gluon density xg(x), the valence quark densities xu v (x) = x(u(x) — u(x)) and xd v (x) = 
x(d(x)—d(x)), the light sea distribution defined as xS(x) = x [2(u(x) + d(x)) + s(x) + s(x)], 
and xA(x) = x(d(x) — u(x)). Most QCD analyses make use of a simple functional form, like: 



where can be e.g. a polynomial function in x or y/x. The parameterisation can also be 
based on interpolation polynomials or non-linear functions. The latter approach is exploited by 
the NNPDF collaboration and is described in section 3.2.3 in more detail. 

The DGLAP evolution equations are used to obtain, from these parameterised densities at Ql, 
the parton densities xf(x,Q 2 ) at any Q 2 . This allows the theoretical cross sections of the 
processes of interest (DIS, Drell-Yan di-lepton production, jet production, ...) to be computed. 
The parameters that define the PDFs at the starting scale (e.g. c^, fa, .. in Eq. 43) can then be 
obtained by fitting these theoretical predictions to the experimental measurements. 

This is achieved by minimising a \ 2 function. A usual choice for this function is: 



xfi(x,Q 2 ) =A i x ai (l-xf i P i (x) 



(43) 




(44) 



exp 



where the individual contribution of each independent dataset is given by [182, 183]: 




(45) 



In this equation, 
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• di denote the measurements and the corresponding theoretical predictions; 

• the total uncorrelated uncertainty affecting the measurement i is given by «j which sums 
in quadrature the statistical and the uncorrelated systematic uncertainties; 

• k labels the sources of correlated systematic uncertainties; 

• /3k,i is the amount of change of di when the source k (for example an energy scale) is 
shifted by one standard deviation = 1); the values of /3k,i are taken from the correlated 
systematic error tables published by the experiments; 

• the second term in Eq. 45 introduces a quadratic penalty s| when the data points are 
moved coherently by (3k,iSk and restricts large deviations from Sk = 0. 

When the parameters s k in Eq. 45 are fixed to zero, the fit is performed to the raw data points 
published by the experiments, but the correlated systematic errors are ignored. Instead, the s k 
can be free parameters of the fit and determined by the x 2 minimisation. Technically, they are 
obtained analytically [182, 183]: the x 2 is quadratic in s fc , hence dx 2 /dsk = leads to a simple 
matrix equation for the s^. This means that the fit is not performed to the raw data, but to the 
data shifted by the optimal setting for the systematic error sources as determined by the fit. In 
that case, at the x 2 minimum, Eq. 45 is mathematically equivalent to the standard x 2 expression 
involving the correlation matrix between the measurements, 

with Vij = af(5ij + J^k PkiPkj/® 2 )- However, inverting the large matrix makes this expres- 
sion inconvenient, hence Eq. 45 is preferred. Moreover, it also facilitates the determination of 
the fit uncertainties, as will be discussed in section 3.3.1. 

3.2 Choices and assumptions 
3.2.1 Experimental input 

QCD fits may not include all experimental measurements described in section 2, despite the fact 
that they all provide sensitivity to some parton distribution functions. 

Including as much experimental input as possible provides maximal constraints. This "global 
fit" approach is followed by the CTEQ/CT, MSTW and NNPDF collaborations, which include 
typically 3000 experimental points in their latest analyses. However, tensions between differ- 
ent datasets may require the PDF uncertainties resulting from the fit to be enlarged (see sec- 
tion 3.3.2). If these tensions were due to problems with one experimental dataset (e.g. wrong 
calibration or underestimated systematic uncertainties), the usual procedure, that is equivalent 
to inflating the experimental errors of all measurements included in the fit, would result in an 
overestimation of the PDF uncertainties. On the other hand, apparent inconsistencies between 
datasets may also arise because the fit does not have enough flexibility, or because some underly- 
ing assumptions (see section 3.2.3) are not correct. In that case, the enlarged PDF uncertainties 
would not necessarily be over-conservative. 
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The fits performed by other groups include only a subset of all available data. For example, 
the latest fits performed by GJR and ABKM include most measurements from deep inelastic 
scattering experiments (eN, /jN and vN), Drell-Yan measurements from fixed target experi- 
ments and jet production data from the Tevatron experiments, but Tevatron data on W and Z 
production are not included. The HERAPDF fits use only the measurements from the HI and 
ZEUS experiments 10 , which are known to be fully consistent with each other and for which 
the systematic uncertainties are very well understood. Despite the much reduced experimental 
input, with about 600 points included in HERAPDF 1.0, good constraints can be obtained on 
most parton densities, over a very large kinematic range, as will be shown in section 3.4. 1 . Note 
however that additional assumptions are made to compensate for the lack of sensitivity of ep 
measurements alone on the flavour separation. 

Besides the choice of datasets, some data points within a given dataset can be excluded delib- 
erately. That is the case in particular for the DIS measurements at very low x, e.g. x < 10~ 5 , 
where DGLAP evolution may break down. The same holds for data points at very low Q 2 where 
ct s (Q 2 ) would become too large to ensure a good convergence of the perturbative series. A typ- 
ical cut Q 2 min ~ 2 GeV 2 is usually chosen. Data points at very high x and low W 2 are often 
removed as well, as higher twists corrections proportional to 1/Q 2 are enhanced in this domain 
(see section 2.1.6). A requirement that W 2 be above ~ 10 — 15 GeV 2 is usual. An alternative 
approach was followed in [81], where a cubic spline function was chosen to parameterise the 
function Hi(x) that describes the higher twist corrections to F 2 (see Eq. 29), and the parameters 
of this function were fitted together with the PDF parameters. They found that the higher twist 
corrections improve considerably the description of the data of the SLAC experiments, even in 
the region W 2 > 12 GeV 2 . 

3.2.2 Theoretical choices 

The most important theoretical ingredients that can lead to different fit results are the following. 

• For fits performed beyond LO, one needs to choose the renormalisation scheme, usually 
taken to be the M S scheme. One also needs to choose the factorisation and renormalisa- 
tion scales, hf and hr, which are used in the theoretical calculation. For DIS, a common 
choice is to take n 2 F = jj? R = Q 2 . 

• There is no unique way to treat the heavy flavours (HF). In the zero-mass variable flavour 
number scheme (ZM-VFNS), the charm density, for example, is set to zero below Q 2 ~ 
m 2 . Above this threshold, the charm is generated by gluon splitting and is treated as mass- 
less. The drawback of this approach is that it ignores charm mass effects in the threshold 
region. In contrast, in the fixed flavour number scheme (FFNS), there is no PDF for the 
charm and bottom, i.e. there are only three active flavours. For W 2 above the production 
threshold, the DIS production of charm proceeds via photon-gluon fusion, — > cc. The 
drawback of this treatment is that the calculations involve terms in ln(Q 2 /m 2 ) which be- 
come large at high Q 2 and would need to be resummed. The state-of-the-art approach, 

"'Only the inclusive NC and CC measurements were included in HERAPDF1.0. More recent (preliminary) 
fits, HERAPDF1.5 and HERAPDF1.7 [114], also include exclusive HERA measurements, such as jet or charm 
production. 
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called general-mass variable flavour number scheme (GM-VFNS), can be seen as an inter- 
polation between the ZM-VFNS and the FFNS (this is put on rigorous grounds in [184]). 
Such a scheme is not easy to implement, especially at NNLO, and the various fit groups 
choose different prescriptions. The so-called ACOT scheme [185, 186] is used by the 
CTEQ/CT group since CTEQ6.5 [187], NNPDF2.1 uses the FONLL scheme [188, 189] 
(previous versions of the NNPDF fit were performed in the FFNS scheme), and the 
Thorne-Roberts prescription [190, 191] is used in MSTW08. These are reviewed in [192]. 




Figure 17: Ratio of the CTEQ6.5 u density to that of CTEQ6.1, at a scale Q = 5 GeV. The 
contour shows the uncertainty at 90% confidence level of the CTEQ6.1 density From [187]. 

A different treatment of heavy flavours in QCD fits can lead to sizable differences. An 
important update of the CTEQ6.5 fit compared to the previous release, CTEQ6.1, came 
from the treatment of heavy quarks in a general mass variable flavour number scheme. 
Using this scheme instead of the ZM-VFNS scheme lead to a considerable improvement 
of the fit, the \ 2 of the fit being reduced by ~ 200 units for ~ 2700 data points. The 
resulting CTEQ6.5 PDFs mainly differ from the previous CTEQ6.1 fit by larger u and 
d distributions in the region x ~ 10 -3 , for a wide range in Q 2 , as illustrated in Fig. 17. 
This resulted in a ~ 8% increase of the predicted W and Z cross sections at the LHC, 
compared to previous CTEQ estimates, and brought the CTEQ-based prediction closer 
to that obtained using the MSTW08 parton distributions. The uncertainties associated 
to the remaining freedom in defining a GM-VFNS at NLO or NNLO have been studied 
in [193]. At NLO, they result in a 2 — 3% uncertainty on the Z production cross section 
at the LHC. 

• The values of the heavy quark masses m c and m b differ between analyses, and in some 
cases are treated as free parameters of the fit. The values chosen also depend on the renor- 
malisation scheme used to define them and to calculate heavy quark related observables. 
The on-shell scheme uses the pole mass, defined to coincide with the pole of the heavy 
quark propagator at each order of pQCD. This definition is chosen in most QCD analyses. 
Instead, the MS scheme, used in the recent ABM11 analysis [81], introduces a running 
mass m q (Q 2 ). It was shown in [194] that the perturbative stability of predictions for the 
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heavy quark structure functions is better in terms of the M S mass, thus leading to reduced 
theoretical uncertainties due to variations of the renormalisation and factorisation scales. 
In most QCD analyses, the masses m c and m b are fixed in the fit, and fits obtained when 
the masses are varied from the chosen central values may be provided together with the 
central fit. In the ABM11 analysis, rrib(Q = mb) and m c (Q = m c ) are fitted together 
with the PDF parameters, with external constraints given by their world average values. 
The HERA experiments have also performed preliminary fits where the PDFs are fitted 
together with the pole mass of the charm quark [114]. 

• The value of the strong coupling constant a s {M z ) is an important consideration in any 
QCD analysis, unless it is treated as a free parameter in the fit. Many experimental ob- 
servables can be used to measure a s (Mz). The world averages of a s (M z ) do not include 
the measurements of scaling violations of the structure function F 2 , as dF 2 /dln(Q 2 ) is 
sensitive to the product of a s times the gluon density. However, a s (M z ) can be deter- 
mined together with the parton distribution functions in a combined QCD analysis. In 
order to disentangle the gluon density from the strong coupling constant, these QCD fits 
should include jet data in addition to structure function measurements [195, 196]. In the 
central fit of the MSTW08 and ABM1 1 analyses, a s (M z ) is fitted together with the other 
parameters that define the PDFs at the starting scale. In contrast, other groups fix a s (M z ) 
and provide several sets of fits, corresponding to a range of fixed a s values 11 . 

• Different numerical methods can be used to calculate the theoretical cross sections that 
are needed in the fit. Indeed, while the NLO calculation of inclusive DIS cross sections 
can be done relatively fast, the exact calculation of jet cross sections, for example, using 
standard techniques requires huge CPU resources and, in practice, numerical approxima- 
tions have to be used when such processes are included in a NLO QCD fit (where the 
calculation of cross sections has to be done for every iteration of the fit). For example, 
the FastNLO technique [197] allows rapid calculations for a large number of jet cross 
sections, with a very high accuracy. The method rewrites the cross sections as a sum of 
products where the time-consuming step is factorised out, such that it needs to be done 
only once. This approach is used by the MSTW and NNPDF groups to calculate the jet 
cross sections. A similar technique is used in APPLgrid [198] which covers a broader 
range of processes and in the FastKernel approach [ 76] developed by the NNPDF col- 
laboration. Alternatively, or for processes for which no fast NLO calculation is available, 
a "K-factor" approximation can be used. For each bin of the experimental measurement, 
the factor K = (JnloI&lo lS first calculated for a given PDF. In the calculations per- 
formed in each iteration of the fit, only the leading order cross section is calculated, and 
it is multiplied by this factor K to account for the higher order corrections. Usually 
the procedure is repeated in which the fT-factor is re-evaluated using the PDFs from the 
converged fit and another series of fit iterations is performed. 



Moreover, the fits usually require basic consistency constraints to be satisfied. For example, 
a set of parameters that would lead to negative cross sections, negative values of F 2 or of F L , 
should not be considered as a valid fit. On the other hand, a valid fit may result, for example, 

11 To facilitate comparisons with results from other groups, MSTW08 and ABM1 1 also provide fits for a range 
of (fixed) a s (Mz) values. 
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in a negative gluon distribution at low x, since a parton distribution function is not a physical 
observable. 

3.2.3 Parameterisation choices and assumptions 

In any QCD analysis, a priori choices have to be made to define the starting conditions. These 
are: the starting scale of the fit, the set of densities that are parameterised, and the functional 
form chosen for the parameterisation. Some assumptions usually complement these choices. 
The systematic uncertainty that should be associated with these choices is not easy to assess 
and is usually not estimated explicitly. 

Large values of the starting scale Ql are unpractical. Indeed, since DGLAP evolution has 
the effect of washing out the x-dependence of PDFs with increasing Q 2 , parameterising the 
densities at a low scale offers more freedom in the fit. However, the scale Ql cannot be too 
low, since the DGLAP evolution should be valid down to that scale. Typical values for Ql 
range between 1 GeV 2 and a few GeV 2 . For fits that treat the heavy flavours in a GM-VFNS 
scheme, it is convenient to set the scale Ql below the charm threshold, Ql < m 2 , such that 12 
c(x, Ql) = 0. The GJR group uses a lower starting scale, for example Ql = 0.5 GeV 2 in [178]. 
They assume that all PDFs have a valence form (i.e. the parameter a in Eq. 43 is positive) at a 
low scale; the gluon and sea quarks tend to zero at low x, and the large values for these PDFs at 
Q 2 above a few GeV 2 are entirely generated by the DGLAP evolution 13 . 

The set of parton densities that are parameterised at the starting scale of the fit, and that are the 
input to the evolution equations, has to be chosen depending on the experimental measurements 
that are included in the fit. One does not fit the eleven quark and gluon distributions since the 
data do not contain enough information to disentangle them all. Instead, well-defined combi- 
nations of PDFs are fitted. In addition to the gluon density, at least two quark distributions are 
needed (one singlet distribution that evolves as Eq. 22, one non-singlet that follows Eq. 21). 
A QCD fit to HI NC DIS cross sections only, using xg(x) and two quark distributions as in- 
put to the DGLAP equations, was performed in [203] and allowed the gluon distribution to be 
extracted. Additional measurements are necessary in order to also extract the quark densities, 
and at least four quark distributions need to be parameterised. For example, the HERAPDF1.0 
fit [61], described in more detail in 3.4, parameterises g(x), U(x), D(x) and the valence distri- 
butions u v (x) and d v (x), at a starting scale just below the charm threshold. Since the included 
data have no sensitivity to constrain the strange density, s(x) is assumed to be proportional to 
d(x) at Ql and s(x) = s(x) is assumed. The MSTW08 analysis also parameterises the gluon 
and valence densities, g, u v and d v , together with the light sea density S and the d — u asymme- 
try. Moreover, since it includes strange- sensitive measurements, the total strangeness density 
s + s and the strange asymmetry s — s are also fitted. 

The functional form that is used to parameterise these input densities at the starting scale differs 
depending on the analyses. It should be flexible enough to allow for a good fit. However, too 

12 A non-perturbative "intrinsic charm" component of the proton [ 1 99] would result in c(x, Ql) even below 
the charm threshold. That possibility was explored for example in [200]. 

1 3 In the original idea [20 1 , 202] , the proton was only made of valence quarks at a very low scale, the gluons and 
sea quarks being dynamically generated by the DGLAP evolution as Q 2 increases. The experimental data did not 
support this assumption, which was then revised in order to also include the gluon and sea quarks at the starting 
scale, still keeping the assumption of a valence-like shape. 
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much freedom in the parameterisation should be avoided, as this leads to unstable fits and sec- 
ondary minima. A functional form like Eq. 43 is often chosen. The low x behaviour is motivated 
by Regge phenomenology, which suggests (see for example [ ]) that xg(x) and xq(x) behave 
at low x as {1/ x) ap ^~ 1 with ap(0) ~ 1.08, while the valence distributions depend instead on 
the "Reggeon intercept" or(0) ~ 0.5, i.e. xq v (x) ~ (l/x)"-^ ) -1 ~ x 5 . The high x behaviour 
can be motivated by simple dimensional arguments [204] based on the energy dependence of 
the scattering cross section 14 . While these predictions were approximately borne out by early 
measurements, the existing experimental data now show that the PDFs can not be described by 
a single power-law at high x, hence the correction function Pi(x) in Eq. 43 is necessary. 

The HERAPDF or MSTW analyses take P 4 (z) to be of the form 

Pi(x) = 1 + <XiX + bi^fx + ... , 
whereas in the fits performed by the CTEQ/CT collaboration, it is chosen as 

Pi(x) = exp(ciiX + bi\fx + CiX + diX 2 ) 

In recent global fits, additional flexibility is needed for the gluon distribution, in order to obtain 
a good fit to all the data. The MSTW08 analysis uses 

xg(x) = A g x a9 (1 - xf 3 (1 + a g x + b g ^) + A' g x<(l - xf* (46) 

which allows the gluon density to become negative at low x, while, in the CT10 analysis, extra 
freedom at low x is given by 

xg(x) = A g x a9 (l - xf exp(a g x + d g x 2 - e g /x k3 ) . (47) 

The number of free parameters is usually reduced by imposing the number sum rules, Eq. 3 and 
Eq. 4, together with the momentum sum rule, Eq. 20, which helps fix the gluon normalisation 
and connects the low x and high x behaviour of the gluon density. Additional assumptions are 
often made. For example, the CT10 analysis assumes the same low x power-law for the input 
distribution xu and xd, i.e. a u = in Eq. 43, as well as the equality of the normalisation 
parameters, A u = Ag, such that d — u — )• as x — > 0. A similar assumption is made for the 
HERAPDF 1.0 fit. All in all, there are 10 free PDF parameters in the HERAPDF 1.0 analysis 
and 26 free parameters in the CT10 fit. The MSTW08 fit has 29 free parameters, including the 
value of a s (M z ) which is fitted together with the parton densities 15 . 

Although the functional form chosen to parameterise the densities at Q 2 , is rather flexible, a po- 
tential parameterisation bias does remain, that is difficult to avoid. Moreover, the uncertainties, 
obtained as explained in section 3.3.1, can be considerably underestimated since they usually 
do not include any parameterisation uncertainty. An example of a parameterisation bias is il- 
lustrated in Fig. 18, which shows the relative uncertainty on the gluon density at low x, at a 

14 These arguments predict that, at high x, xq(x) ~ (1 — x)p with f3 = 2n s — 1, where n s is the number of 
"spectator quarks" that are attendant to the parton in the Fock expansion of the proton wave-function, i.e. n s = 2 
for valence quarks (qqq), n s — 3 for the gluon (qqqg), and n s — 4 for anti-quarks (qqq qq). Note that these 
predictions are not well-defined in the context of pQCD, since they do not provide the scale at which they should 
hold. 

l5 Three additional parameters associated with nuclear corrections are also fitted in this analysis. 
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scale Q 2 = 4 GeV 2 . The huge difference between the uncertainty obtained with CT10 and that 
resulting from the previous fit CTEQ6.6 is due to the more flexible gluon parameterisation used 
in the former fit. At low x, the CTEQ6.6 parameterisation is equivalent to a single power-law, 
xg(x) ~ Ax a . Neglecting the correlation between the A and a parameters, this parameteri- 
sation prevents the relative uncertainty from growing faster than linearly with respect to lnx, 
since A(xg)/g = (Aa) lnx. As there is no experimental data at x below 10" 5 and Q 2 above 
the typical lower Q 2 cut used in the fits, the small uncertainty band resulting from the CTEQ6.6 
fit can only be artificial. Indeed, using a very similar experimental input, but the more flexible 
parameterisation Eq. 47 for the gluon density, the uncertainty increases dramatically as shown 
by the CT10 contour in Fig. 18. With this more flexible parameterisation, the CT uncertainty 
becomes comparable to that obtained by MSTW08, where extra freedom to the low x gluon is 
provided by Eq. 46. 




Figure 18: Relative uncertainty (90% confidence level contours) on the gluon density at Q 2 = 
4 GeV 2 , as obtained from the CTEQ6.6, CT10, MSTW08 and NNPDF2. 1 analyses. 

The parameterisation bias can be assessed, to some extent, by varying the starting scale Qq, 
and / or by making variations around the chosen functional form at the given Ql and provid- 
ing an additional parameterisation uncertainty, as pioneered in [60] and also estimated in the 
HERAPDF analysis; however, the resulting uncertainties remain subjective. 

Alternative choices for the densities xfi(x) can be based on Chebyshev polynomials, on any 
interpolation polynomials, or on non-linear functions. The latter approach is exploited by the 
NNPDF collaboration, which uses neural networks to parameterise the densities. The formalism 
is described in [176] and references therein. Neural networks are just another functional form, 
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that generalises parameterisations like xf(x) = J2 n a nP n (x) based on interpolation polynomi- 
als P n {x). They allow non-linear dependencies of the function on the fitted parameters a n . 

The analysis presented in [176] fits the gluon density together with the six densities for light 
quarks and anti-quarks 16 , u,u,d,d,s,s. The neural networks chosen to parameterise these 
densities have 37 free parameters each. Hence, the resulting parameterisation has a total of 
7 x 37 = 259 free parameters, which is much larger than the number of free parameters, 0(25), 
which are fitted in QCD analyses based on a standard functional form like Eq. 43. The use of 
such a flexible parameterisation scheme considerably reduces any parameterisation bias. 

3.3 Treatment of experimental systematic uncertainties 

3.3.1 The various methods 

A lot of work has been done over the past ~ 15 years to assess the uncertainties on parton den- 
sities extracted from QCD fits [183]. The task is not trivial since, as soon as many experimental 
data points are included in the fit, a standard statistical approach does not appear to be adequate, 
as will be discussed in 3.3.2. 

While most of the fits now minimise a x 2 function similar to Eq. 44 and Eq. 45, this x 2 definition 
can be used in different ways: 

• in the so-called "offset method" : the parameters s k are set to zero in the central fit - 
i.e., the central fit is performed without taking into account the correlated systematic 
errors. Then, for each source of systematic error, Sk is set to ±1 and the fit is redone. 
The uncertainty of a given quantity (e.g. a parton density) is calculated by adding in 
quadrature all differences to the quantity obtained in the central fit. 

• in the "Hessian method" : the s k are not fixed, but are parameters of the fit. This means 
that the central fit is performed to the data shifted by the optimal setting for the systematic 
error sources as determined by the fit. The errors on the fitted PDF parameters (p a ) are 
obtained from Ax 2 = T 2 with T = 1 or larger, see section 3.3.2. The error on any given 
quantity F is then obtained from standard error propagation: 



where the covariance matrix C = H^ 1 is the inverse of the Hessian matrix defined by 
H a ,p = \d 2 X 2 /dp a dpp, evaluated at the x 2 minimum. The method developed in [183], 
which is now widely used by most groups, allows the Hessian matrix to be diagonalised 
without numerical instabilities; the error a\ can then be calculated simply as 



The flexibility of the neural network allows any decomposition to be used, as was checked explicitly in [205]. 




(48) 




(49) 
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where the sum runs over the eigenvectors and and S± are PDF sets displaced by A% 2 
along the i th eigenvector direction. For fits that use the Hessian method to determine the 
uncertainties, the PDF sets and ST" are stored in the public LHAPDF library, together 
with the set corresponding to the central fit. 

The offset method gives fitted theoretical predictions which are as close as possible to the "raw" 
data points. It does not use the full statistical power of the fit to correct the data for the best 
estimate of the systematic shifts, since it distrusts that systematic uncertainties are Gaussian 
distributed. The offset method thus appears to be more conservative than the Hessian method. 
It usually results in larger uncertainties than what is obtained from the Hessian method, when 
the criterion Ax 2 = 1 is used to obtain the error bands 17 . With the Hessian method, model 
uncertainties (e.g. varying a s , Qfo, ...) are often larger than the fit uncertainties. This is because 
each model choice can result in different values of the systematic shifts, i.e. when changing the 
model one does not fit the same data points. 

Alternatively, a Monte-Carlo approach can be used (see for example [206]), which avoids the 
usage of the error matrix formalism of Eq. 48. A set of N rep replicas of the n experimental 
measurements is built by sampling the probability distribution defined by the data, such that 
the means, variances and covariances given by the replicas are those of the experimental mea- 
surements. The fit is then performed separately on each Monte-Carlo replica. The best fit is 
defined as the average over the replicas, and uncertainties on physical quantities are obtained as 
standard variances. This method can be used for any choice of the parameterisation of parton 
densities, but it is mostly convenient when the parameterisation is more complex than the stan- 
dard functional form of Eq. 43, leading to a much larger number of free parameters. QCD fits 
using standard parameterisations like Eq. 43 usually make use of the Hessian matrix method to 
propagate the systematic uncertainties - that is the case of the analyses performed by the CTEQ 
and the MSTW groups - although the fit performed by the HI collaboration in [122] estimated 
the uncertainties with the Monte-Carlo approach. The NNPDF collaboration, which uses a more 
flexible parameterisation, always propagates the systematic uncertainties with the Monte-Carlo 
method. For the NNPDF2.0 analysis presented in [176], an ensemble of N rep = 1000 replicas 
of the measurements was used. The result of the N rep fits performed over these replicas (i.e. 
N rep sets of 259 parameters each) is stored in the LHAPDF package, and can be used to predict 
mean values and uncertainties on physical observables. 

3.3.2 The tolerance parameter Ax 2 = T 2 in global fits with "Hessian" uncertainties 

Ideally, the error bands corresponding to 68% (one standard deviation) confidence level (CL) 
should be obtained from the well-known criterion A% 2 = 1, or A% 2 = 2.71 for the 90% 
(two standard deviation) contours. This would be appropriate when fitting consistent datasets 
to a well defined theory, with systematic uncertainties being Gaussian distributed. However, 
in practice, these conditions are not necessarily fulfilled. For example, when fitting data from 
various experiments, it can happen that some datasets are marginally compatible with the others, 
possibly because some systematic uncertainty has been underestimated. Such datasets should 

17 However, when the systematic errors are smaller than the statistical errors, both methods give very similar 
results. 
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not be dropped from the fit unless there is a clear experimental evidence that the measurement 
is incorrect. Instead, the level of inconsistency between the datasets should be reflected in the 
uncertainties of the fit. This can be done by considering the sets of PDF parameters as alternative 
hypotheses, and by allowing all fits for which a desired level of consistency is obtained for all 
datasets. If a dataset consists of N experimental points, its partial x 2 should be about N± y2N. 
Practically, a tolerance parameter T is chosen such that the criterion A\ 2 = T 2 ensures that 
each dataset is described within the desired confidence level. An example procedure to obtain 
the numerical value of T can be found in [207]. For example, the 90% CL contours of CTEQ6.6 
correspond to T = 10 (T ~ 6 for 68% CL), while the MRST fits [208] used T = V50 ~ 7. The 
MSTW08 analysis uses a "dynamical tolerance" [173] where T can be different for the various 
eigenvectors of the Hessian matrix, with values ranging between T ~ 1 and T ~ 6.5 for the 
68% CL contours. 

While this approach is well motivated and based on how far the parameters can be varied while 
still giving an acceptable description of all the datasets, one should keep in mind that setting 
Ax 2 = 100 or 50 corresponds to an increase of the errors of all experiments by a factor of 
typically 5 — 6, including those for which the measurements are very well controlled. 

3.4 QCD fits to DIS data 

As seen in section 2.3, the HERA experiments performed high precision measurements of NC 
and CC DIS in a large kinematic domain, both in e + p and e~p collisions. In particular: 

• the precise measurement of the scaling violations of the structure function F 2 in NC DIS, 
i.e. its logarithmic dependence on the four-momentum transfer squared Q 2 , gives access 
to the gluon density at low and medium x; 

• the precise measurement of F 2 in NC DIS at low and medium x sets strong constraints on 
the combination A{U + U) + (D + D) where U and D denote the combined up-type and 
down-type quark densities, U = u + c and D = d + s + b; 

• the CC DIS measurements provide two constraints: one on a linear combination of D and 
U (e + p data), and one on a linear combination of U and D (e~p data); 

• the measurement of xF 3 obtained from the difference of e + p and e~p NC DIS cross 
sections provides a constraint on a linear combination ofU — U and D — D. 

As a result, good constraints can be obtained on the gluon density as well as on U, D, U and D 
from HERA data alone. The separation between U and D, which in fits to HERA data is pro- 
vided by the CC measurements, can be further improved by including e.g. DIS measurements 
on a deuterium target. In fits based only on inclusive DIS measurements, separating further U, 
D, U and D along the individual quark flavours mostly relies on assumptions. 
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3.4.1 Fit of the combined HERA-I inclusive datasets 



The HERAPDF1.0 parton densities [61] were extracted using only the averaged HI and ZEUS 
DIS measurements presented in section 2.3.4. The averaged HERA dataset corresponds to 741 
cross section measurements: 528 (145) measurements of e + p (e~p) NC DIS and 34 measure- 
ments of both e + p and e~p CC DIS. Following a cut Q 2 > Q 2 min with Q 2 min = 3.5 GeV 2 , 
imposed to remain in the kinematic domain where pQCD is reliable, 592 data points are in- 
cluded in the QCD fit of [61]. The fit is performed at NLO within the general mass variable 
flavour number scheme of [190, 191]. The starting scale is chosen to be slightly below the charm 
threshold, Q\ = 1.9 GeV 2 . 

The initial parton distributions 18 xf = xg, xu v , xd v , xU and xD are parameterised at Ql using 
the generic form: 

xf(x) = Ax B (l - xf(l + ev ^ + Dx + Ex 2 ) (50) 

At low x the assumptions d/u — >■ 1 as x — >■ and B Uv = B dv are made, which together with 
the number and momentum sum rules, removes 6 free parameters of the fit. The strange quark 
distribution, xs = f s xD, is expressed as a x— independent fraction, f s = 0.31, of the down-type 
sea at the starting scale. 

A 9-parameter fit is first performed by setting to zero all e, D and E parameters in Eq. 50. 
These parameters are then introduced one by one, the best 10-parameter fit having E Uv ^ 0. As 
a result 19 , the central fit of HERAPDF1.0 is a 10-parameter fit corresponding to the following 
parameteri s ation : 



xg(x) = A g x Bg (l - x) c <> (51) 

xu v (x) = A Uv x B ^(l -xf^ [1 + E Uv x 2 ] , (52) 

xd v (x) = A dv x Bd - (1 - x) Cd - , (53) 

xU(x) = A D x B v(A -xf", (54) 

xD(x) = A D x B °(l - xfo. (55) 



The fit has a x 2 of 574 for 582 degrees of freedom. Example PDFs at the scale Q 2 = 10 GeV 2 
are shown in Fig. 19, together with their uncertainties obtained as described below. 

The experimental uncertainty of the HERAPDF1.0 PDFs (shown as the red band in Fig. 19) is 
determined using the Hessian method described in section 3.3.1, taking into account the 110 
sources of systematic errors of the individual measurements together with their correlations. 
The tolerance criterion A% 2 = 1 is used to determine the la error bands. Three additional 
errors are included which account for different treatment of the systematic uncertainties in the 
averaging procedure of the HI and ZEUS measurements. These are the largest uncertainties and 

18 Since c(x) = and b(x) = at the chosen starting scale, U (x) = u(x) and D(x) = d(x) + s(x) at Qq- 
19 Setting E Uv ^ and introducing an eleventh parameter in the fit does not reduce the \ 2 significantly, with 
the exception of the fit having both E Uv ^ and D g ^ 0. However, the latter leads to very low valence quark 
distributions at high x, with in particular d v (x) < d(x). As this would dramatically fail to describe e.g. vd fixed 
target measurements of the structure function xF^ [82], this solution is discarded for the "central fit". However it 
is included in the parameterisation uncertainty discussed below. 
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Figure 19: Distributions of valence quarks densities xu v and xd v , of the gluon density xg and 
of the density of sea quarks xS obtained from the fit to the HERA-I dataset, at Q 2 = 10 GeV 2 , 
together with one standard deviation uncertainties. From [61]. 

are included using the more conservative offset method 20 . Very good constraints are obtained 
on U, D, U and D, which are the combinations the measurements are directly sensitive to, in 
a large range of x extending down to 1CT 4 and up to O(0. 1). As expected, the best constraints 
are obtained on U, with an uncertainty that remains below 10% for x ^0.5. The gluon PDF is 
also well constrained up to x ~ 0.2, and down to 21 x ~ 10~ 4 . 

The model uncertainty, shown as the yellow band in Fig. 19, is obtained by varying the input 
values of f s , m c , m b and Q 2 min and repeating the fit. The largest effect comes from the variation 
of f s and of the heavy quark masses, which affects considerably the strange and charm densities. 

Following [60], an assessment of the additional uncertainty that is introduced by the param- 
eterisation choice is made in this analysis. This is particularly relevant since the number of 
parameters in the central parameterisation of this analysis (Eq. 52-55) is rather small (10). The 
variation of the starting scale Ql is included in this parameterisation uncertainty. Indeed, when 
the DGLAP equations are used to perform a backward evolution of the gluon distribution ob- 
tained from the central fit, from Ql = 1.9 GeV 2 down to 1.5 GeV 2 , the resulting function cannot 
be fitted by the simple form of Eq. 5 1 . Consequently, repeating the fit with a lower starting scale 
Qq = 1.5 GeV 2 results in large differences compared to the central fit if the same parameter- 
isation is kept, because Eq. 5 1 is not flexible enough to describe the gluon distribution at low 
scales. Hence, when the fit is repeated with Ql = 1.5 GeV 2 , additional freedom is given for the 
gluon distribution by subtracting a term A' g x B a{\ — x) c s to Eq. 51, as first suggested in [173], 
where C' g is fixed to a large value which ensures that this additional term does not contribute 
at high x. Moreover, an additional fit is performed using the parameterisation of the central fit 
but relaxing the assumption B Uv = Bd v . Alternative 11 -parameter fits with E Uv ^ are also 

20 For the other 110 sources of systematic uncertainties, the offset method yields very similar results as the 
Hessian method, since these uncertainties are smaller than the statistical uncertainties. 

21 Large uncertainties are actually obtained at low x for the starting scale Q 2 = 1.9 GeV 2 , but are quickly 
washed out by the DGLAP evolution. 
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considered, including those which lead to good fit quality but peculiar behaviour at large x. An 
envelope is constructed, representing the maximal deviation, at each x value, between the cen- 
tral fit and the fits obtained using these parameterisation variations. This envelope defines the 
parameterisation uncertainty of HERAPDF1.0 and is shown separately in Fig. 19 as the green 
band. The gluon and sea PDFs at low x are mostly affected by the variation of Ql while at large 
x (and over the whole x range for the valence distributions), adding an eleventh parameter in the 
fit dominates the parameterisation uncertainty. The total PDF uncertainty is obtained by adding 
in quadrature the experimental, model and parameterisation uncertainties. In particular, the pa- 
rameterisation uncertainty increases considerably the uncertainty of the gluon PDF at x ~ 0.1 
and beyond. 

3.4.2 Impact of jet data on the fits to HERA data 

As seen in Fig. 19, the precision on the gluon density is limited at medium and high x when only 
inclusive HERA data are used. Adding HERA data from jet production in DIS and in photo- 
production was shown to lead to better constraints [209]. Although HERA jet data do not bring 
strong constraints on the gluon density at high x due to the limited statistics (better constraints 
at high x are brought by Tevatron jet data), they can be useful for medium x, since HERA jet 
cross sections have small systematic uncertainties (typically 5%, a factor of at least 2 smaller 
than the systematic uncertainties of jet cross sections measured by the Tevatron experiments, 
see section 2.4.3). For the fits performed using only HERA DIS inclusive data [209], the uncer- 
tainty of the gluon density was reduced by a factor of ~ 2 in the mid-x region, x = 0.01 — 0.4, 
when measurements of inclusive jet production at HERA (see Fig. 10) were included. It is 
also interesting to note that both fits, with and without the jet data, lead to the same shape for 
the gluon density, indicating that there is no tension between the HERA jet and inclusive DIS 
data. Similar conclusions were reached in preliminary fits using the full statistics of HERA 
data [114]. 

3.4.3 Fits performed using preliminary combinations of HERA data 

Following HERAPDF1.0, several QCD fits have been performed to preliminary combinations 
of HERA data. As for HERAPDF1.0, the extraction of the HERAPDF1.5 PDFs relies on in- 
clusive DIS data only, but a preliminary combination of HI and ZEUS measurements from 
HERA-I and HERA-II was used instead of the published HERA-I combined dataset. This fit 
was also performed with additional freedom given to the gluon and the u v parameterisation in 
Eq. 51 and 52, leading to 14 free parameters instead of 10 (HERAPDF1.5f). This extended pa- 
rameterisation was also used for the extraction of NNLO PDFs, HERAPDF1.5 NNLO. These 
fits, although unpublished, are available in the LHAPDF interface. HI and ZEUS jet data were 
added in order to extract the HERAPDF1 .6 (NLO) PDFs. The most recent preliminary fit, HER- 
APDF1.7, also includes the F| c measurements and the data taken in 2007 with a lower proton 
beam energy. Further details can be found in [1 14, 210]. 
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Figure 20: Top left: relative uncertainties on example quark densities obtained from the NNPDF 
fit to HERA data alone. Top right and bottom: comparison of the results of the fits with and 
without the fixed target data for the u v , D and gluon distributions. The contours correspond to 
one standard deviation uncertainties. 

3.4.4 Impact of fixed target DIS data 

In fits based on HERA data alone, the flavour separation, as well as the separation between 
quarks and anti-quarks (i.e. between valence and sea quarks) is provided by the high Q 2 mea- 
surements of CC cross sections and of xF 3 in NC interactions. This separation, in particular at 
high x, can be further improved by adding measurements of fixed target DIS experiments to the 
fitted data. In particular, the F% measurements of BCDMS and NMC mostly set a constraint on 
Ad + u which nicely complements that on A(U + U) + (D + D) set by lepton-proton measure- 
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ments of F%. Moreover, the measurements made in neutrino DIS provide direct access to the 
distribution of valence quarks, assuming that nuclear corrections are under control. 

In [211], the improved determination of quark distributions brought by the addition of fixed 
target measurements in a fit to HERA data was studied within the fitting framework of the ZEUS 
experiment. Here, the complementarity between HERA data and fixed target DIS measurements 
is illustrated by using the series of fits performed by the NNPDF collaboration, identical to the 
NNPDF2. 1 analysis but using subsets of experimental data. These fits are described in [ 1 77] and 
have been released in the LHAPDF package. In particular, a fit has been performed using only 
the HERA data (the combined HERA-I NC and CC datasets from HI and ZEUS, inclusive e~p 
HERA-II measurements from ZEUS [121, 125], as well as Fl and F 2 CC measurements). With 
respect to HERAPDF1.0 the fitting method used here largely avoids any parameterisation bias. 
A similar fit has been performed by also including data from the fixed target DIS experiments 
described in section 2.2. 

Figure 20 (top left) shows relative uncertainties on example quark densities resulting from the 
fit to HERA data only. While the combination A(U + U) + (D + D) that is directly probed by 
the F$ measurements is well constrained over the full x range, the uncertainty increases at high 
x when one tries to separate up-like from down-like distributions, and quarks from anti-quarks. 
The much improved separation of valence and sea densities provided by the fixed target DIS 
data is illustrated in Fig. 20 (top right) with the example of the u v distribution. The distributions 
of down-like quarks obtained from the two fits are compared in Fig. 20 (bottom left). Again 
a much better determination at medium and high x is achieved when the fixed target data are 
included in the fit, resulting in a reduced high x distribution. In contrast, the constraints at 
low x are largely coming from HERA data. Figure 20 (bottom right) shows that the gluon 
determination is not improved significantly by the addition of fixed target data in the fit. 

The next section will show how the determination of quark densities at high x, in particular 
the separation between quarks and anti-quarks, can be further improved by including Drell-Yan 
measurements in the fits (section 3.5.3). It should also be noted that this determination will 
benefit from the stronger constraints brought by the full e~p and e + p HERA-II measurements 
at high Q 2 , which show a much better precision than the HERA-I measurements for example 
on xF 3 , and from the final HERA combination. The specific impact of the e~p and e + p HERA- 
II data from HI was studied in [122] within an analysis framework similar to that used for 
HERAPDF1.0 and found indeed to be significant. 

3.5 Global QCD fits 

Although HERA data alone can determine the distributions of all partons, albeit with a limited 
precision at high x, the determination of parton densities in the proton is considerably improved 
by including additional datasets. The following datasets are routinely included in "global" 
pQCD analyses of the proton structure. 

As shown in section 3.4.4, the inclusive NC DIS measurements from fixed target experiments 
using a deuterium target and the CC DIS measurements from the uN experiments improve the 
flavour separation and allow better disentanglement between the quarks and the anti-quarks (i.e. 
the sea and the valence distributions). The Drell-Yan measurements pN — > jiji mostly constrain 
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Figure 21: Experimental data which enter in the NNPDF global analysis (from [176]). 

the sea densities. In particular, they set important constraints on the anti-quark densities at 
medium and high x, which are not well known from DIS data alone. Comparing the Drell-Yan 
cross sections measured in pp and pd provides important constraints on the ratio d/u at medium 
x. 

The exclusive production of muon pairs in neutrino-nucleon scattering, v^s — >• pc — >■ fJ,fJ,X, is 
the only pre-LHC process that sets direct constraints on the strange density 22 . 

The Tevatron measurements of inclusive jet cross sections set the strongest constraints on the 
gluon density at high x. The measurements of W and Z production at the Tevatron mostly 
constrain the u and d densities in the valence domain. Since the u density is already well 
constrained by the DIS experiments, they improve our knowledge of the d density and of the 
ratio d/u at medium x. 

The addition of non-HERA data in the QCD analyses typically leads to O(3000) data points 
to be included in the fit. Fig. 21 shows how the experimental data included in the NNPDF2.0 
and NNPDF2.1 analyses are distributed in the (x, Q 2 ) plane. These two fits include 2841 points 
from DIS experiments (with 743 HERA points), 318 points of Drell-Yan production in fixed 
target experiments, 186 points of jet production at the Tevatron, and 70 points of vector boson 
production by DO and CDF. 

In this section we mostly discuss results from the MSTW08, CT10 and NNPDF2.1 NLO anal- 
yses. They are based on a similar experimental input and use a GM-VFNS for the treatment of 
heavy flavours. The MSTW08 analysis parameterises g, the valence quark densities u v and d v , 

"Measurements of multiplicities of strange hadrons, performed by the HERMES experiment [212], should also 
constrain the strange PDF, once the experimental observables are corrected for fragmentation effects. However, 
they have not yet been included in any global QCD analysis. 
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the light sea S = 2(u + d) + s + s, the asymmetry A = d — u, the total strangeness s + = s + s 
and the strange asymmetry s~ = s — s at a starting scale Q% = 1 GeV 2 . The CT10 analysis 
parameterises g, u v , d v , u, d, s at Ql = 1.69 GeV 2 and assumes s = s. The NNPDF2.1 anal- 
ysis parameterises the gluon density and the six quark and anti-quark light flavours at a scale 
Ql = 2 GeV 2 . In the MSTW08 analysis, a s (M z ) is fitted together with the PDF parameters, 
while in the CT10 and NNPDF2.1 fits it is set to a constant value. 

Figure 22 compares a few example PDFs at Q 2 = 2 GeV 2 , as extracted from MSTW08, CT10 
and NNPDF2.1 23 The agreement for the gluon density at low and medium x is rather good. 
In particular, the error band of CT10 is much larger than the one previously predicted from 
CTEQ6.6, which used a less flexible parameterisation for the gluon density (see Fig. 18), and 
agrees with that obtained with MSTW08 and NNPDF2.1. At high x however, the gluon den- 
sities predicted by the three fits show sizable differences. Moreover, the three fits lead to very 
different predictions for the strange densities s + and s~, although they all use the same strange- 
sensitive datasets. The NNPDF fit makes no assumption on the shape of the total strangeness 
density s + = s + s, in contrast to the MSTW08 and CT10 fits (see section 3.5.5). This results 
in a larger error band, which impacts the uncertainty of other flavour PDFs at low x, especially 
that of the down quark, since the main constraint on low x quarks comes from the HERA mea- 
surement of F 2 , which probes charge weighted sums of quark PDFs. Some differences can 
also be seen in the valence distribution, in particular for x ~ 0.1. Since the error band of the 
NNPDF2.1 fit is not much larger than that of the other fits, it is unlikely that this difference 
comes solely from a parameterisation bias; it could be due to, for example, differences in the 
treatment of nuclear corrections to neutrino DIS data, which set important constraints on the 
valence densities [38]. The next paragraphs show a more detailed comparison of these three 
fits, together with the specific impact of the non-DIS datasets. 

3.5.1 Tevatron data on W and Z production and the d/u ratio 

As shown in section 2.4.3, the shape of the rapidity distribution of W and Z bosons at the 
Tevatron provides interesting constraints on the u and d densities at x^O.01. The W charge 
asymmetry constrains the ratio d/u and its slope. This ratio is otherwise mostly constrained 
by the ratio F^/F^ measured by the NMC experiment, and by the deuterium measurements 
of BCDMS. In practice, the Tevatron constraints on d/u are mostly constraints on the down 
density, since the density of up quarks is much better known in that range. 

CDF Run I data on the W asymmetry in the electron channel [ I 3] have been included for 
long in the QCD analyses. They are now complemented by more precise Run II data [152, 
154-156], some of them [154, 156] being also available in several bins of the lepton transverse 
energy. Fig. 23 shows the effect of the Run II measurements from CDF [154] and DO [155], 
corresponding to a luminosity of 170 pb^ 1 and 300 pb -1 respectively, on the extracted down 
valence distribution. The larger uncertainty obtained in the MSTW08 fit is due to additional 
freedom introduced in the d v parameterisation, compared to the previous fit MRST2006 [208]. 

23 The NNPDF2.1 analysis was affected by an error in the calculation of di-muon production in neutrino DIS 
scattering, which had a significant effect on the strange distribution. This error has been corrected in [2 1 3] and in 
the fit labelled NNPDF2.3-noLHC which is shown in Fig. 22 for the s + and s~ distributions, which is based on 
the same experimental input as NNPDF2.1. 
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Figure 22: Comparison of recent global fits at Q 2 = 2 GeV 2 , for the singlet density^ = 

q), the gluon density (from [1 77]), the total strangeness s + = s + s, the strange asymmetry 

s~ = s — s and the total valence distribution. Contours at 68% confidence level are shown. 



A change in the shape is clearly visible, with a significant increase in d v for x ~ 0.3, which is 
compensated (because of the number sum rule) by a decrease for lower x values. 
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Figure 23: Comparison of d v as extracted from the MSTW analysis, when taking into account 
the CDF [154] and DO [155] Run II W asymmetry data (MSTW08), or when using only the 
CDF Run I measurements (MRST 2006). From [ 1 73]. 

It is interesting to note that the inclusion of the measurements of W charge asymmetry made 
by DO in several Et bins using 750 pb _1 of Run II data [156] is problematic, as consistently 
observed by all analyses. These data show some incompatibility with the DIS structure function 
data, in particular the NMC measurement of F% / F 2 d and the BCDMS F$ measurement, and they 
also show some tension within themselves. The MSTW08 analysis decided not to include the 
DO [156] and CDF [152] high luminosity Run II measurements in their fit, pending further 
investigation. The DO Run II data of [155, 156] are also excluded from the NNPDF analyses 
and from the main CT10 fit. However the CTEQ group also provides a fit with these data 
included (CT10W), obtained by artificially increasing the weight of these datasets in the global 
fit. Figure 24 compares the d/u ratios obtained from these two fits. The ratio obtained from 
the CT10W fit has a markedly different slope at a; > 0.01, and a much reduced uncertainty as 
compared to CT10. While the outcome of the CT10W fit has to be used with care until the 
compatibility with other data is better understood, this shows the potentially large implications 
that precise W asymmetry data from the Tevatron can have on the d/u ratio, and hence on the 
down density at large x. 

3.5.2 The asymmetry of the light sea 

The combination of constraints from muon-proton and muon-deuteron DIS, from HERA data, 
and from neutrino DIS data, is not enough to determine the light sea asymmetry d — u, which 
is very loosely constrained by a fit which includes DIS data only. The inclusion of Drell-Yan 
data (mostly proton and deuteron fixed target data) dramatically improves this determination, 
as shown in Fig. 25. Including Tevatron data in addition does not further reduce the uncertainty 
in a significant manner. The figure was obtained from a series of fits performed by the NNPDF 
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Figure 24: The d/u ratio obtained from the latest CT10 fit without including the Run II DO 
measurements on the W charge asymmetry (left), and with these measurements included (right). 
The ratio is normalised to that derived from the previous CTEQ fit (CTEQ6.6). The uncertainty 
bands correspond to two standard deviations. From [1 75]. 



collaboration, identical to the NNPDF2. 1 analysis but using subsets of experimental data, which 
have been released in the LHAPDF package. 




Figure 25: The asymmetry of the light sea A s (x) = d(x) — u(x) and its one standard deviation 
uncertainty at Q = 2 GeV as obtained from the NNPDF analysis, when only DIS data are 
included in the fit (yellow contour), when Drell-Yan data are included in addition (red hashed 
contour), and from the reference NNPDF2. 1 fit (blue hashed contour). 
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3.5.3 Quarks and anti-quarks at high x 



The Drell-Yan measurements from fixed target experiments are also extremely useful to con- 
strain the quark and anti-quark densities at high x. This is illustrated in Fig. 26, which uses 
again the NNPDF2.1 sets released in the LHAPDF interface. The HERA data alone provide 
very little constraints on the anti-quark densities at x ~ 0(0.1). Fixed target neutrino DIS ex- 
periments, which measured both uN and vN cross sections, provide a separation between the 
valence and the sea quarks at high x. As a result, a fit to the full DIS data reduces the uncertainty 
on the anti-quark densities at high x. However, the resulting uncertainty on d(x) remains large, 
e.g. ~ 40% at x ~ 0.2. With the addition of the fixed target Drell-Yan data, this uncertainty is 
reduced down to ~ 10%. The other datasets included in NNPDF2.1 do not reduce further the 
uncertainties. 

Figure 26 also shows the uncertainties obtained in a fit using only data from the collider ex- 
periments (HI and ZEUS, DO and CDF). Although the Tevatron data help constrain the d(x) 
distribution at high x, their impact is not as large as that of the Drell-Yan data, and their impact 
on the uncertainty of u(x) at high x is marginal. The measurement of high mass di-lepton pro- 
duction at the LHC will obviously bring further constraints on anti-quarks at high x, assuming 
that effects of physics beyond the Standard Model do not distort the mass spectrum 24 . 




Figure 26: One standard deviation uncertainties on u(x) (left) and on d{x) (right) at Q 2 = 
10 4 GeV 2 for x between 0.02 and 0.6, as obtained from the NNPDF2.1 global fit (filled area), 
and from the same fit but applied to a subset of experimental data. 



Several new phenomena may lead to an enhancement or to a reduction of the production of a high mass di- 
lepton pair at the LHC as e.g. qqll contact interactions, quark substructure, or "towers" of Kaluza-Klein gravitons 
in models with large extra spatial dimensions. 
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3.5.4 Tevatron jet data and the gluon distribution 

The inclusive jet production at the Tevatron experiments is very sensitive to the gluon density 
at high x. Early CDF Run I measurements reported in 1995 [159] indicated an excess of high 
Pt jets compared to NLO QCD predictions based on the PDFs available at that time. The 
possibility that this excess could be an indication of quark compositeness created quite some 
excitement, until it was shown [214] that these jet data could be satisfactorily accommodated in 
a global fit, which resulted in a larger gluon density at high x. While this showed the important 
role of jet constraints on the gluon density, it also triggered intensive efforts in order to provide 
PDFs with associated uncertainties, which lead to the state-of-the-art presented above. 

The inclusive jet production cross sections measured by DO and CDF are included in the 
global QCD analyses since CTEQ4 [214], MRST2001 [215] and NNPDF2.0. Since CT10 and 
MSTW08, the Run II measurements are used in place of the Run I results. Indeed, these new 
datasets have a much higher statistics and smaller systematic uncertainties, and the experiments 
have provided the full correlation matrix of systematic errors. These Run II measurements 
prefer a smaller high x gluon distribution than the Run I data. 

Figure 27 shows the impact of Tevatron jet data on the determination of the gluon density. A fit 
similar to that of NNPDF2. 1 has been performed, using DIS data only, and it is compared to the 
standard fit of NNPDF2. 1 . At low x, both fits lead to a very similar gluon density, with the same 
relative uncertainty, meaning that most of the constraints on the gluon density are coming from 
DIS data (mainly HERA). However, at medium and high x, the non-DIS datasets (mainly the 
jet measurements from the Tevatron experiments) provide a significantly improved uncertainty 
on the gluon density. 




Figure 27: Comparison of the gluon density obtained from NNPDF2.1 and from a similar 
fit restricted to the DIS data, at low and medium x and Q = 2 GeV (left), and at high x 
and Q = 100 GeV (right). The ratio to the NNPDF2.1 density is shown, together with 68% 
confidence level contours. 
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3.5.5 The strange sea 



As seen in section 2.2.4, the exclusive production of di-muon events in neutrino DIS exper- 
iments, measured by the NuTeV and CCFR experiments, sets constraints on the density of 
strange quarks for x £ 1CT 2 and x ^ 0.3 — 0.4, via the subprocess Ws — » c. The recent inclusion 
of these data in global fits allows the strange content of the nucleon to be studied in more de- 
tails [173,216]. Previous fits assumed that s + s was a constant fraction of the non-strange sea 
u + d at the starting scale, 

s + s = k s (u + d) 

with k s ~ 0.4 — 0.5 reflecting the suppressed probability to produce ss pairs compared to uu 
or (id pairs. Recent fits however, give more freedom to the strange density in particular at high 
x. This additional freedom leads indeed to an improved x 2 . 

Both the CTEQ6.6 and the MSTW08 analyses assume that, at low x, the strange density follows 
the same power-law as the light sea density, i.e. s + s oc x a with a set to the low-x power of 
the light sea (MSTW08) or to that of the u and d densities (CTEQ6.6). They parameterise the 
strange density s + s as 

xs + xs = Ax a (l — x) l3 P(x) 

and fit the normalisation A and the high-x power (3. Parameters defining the polynomial func- 
tion P(x) are also fitted in the CTEQ6.6 analysis, while in MSTW08 they are fixed to be 
the same as those defining the g(x) of the total sea. The CTEQ6.6 analysis still assumes 
s = s, while the MSTW08 analysis parameterises the strange asymmetry as xs — xs — 
Ax a (l — — x/xq) where x is given by the number sum rule of zero strangeness, and 

a is fixed to 0.2 as the data do not constrain A and a independently. 

Figure 28 (left) shows the strange density obtained in the two fits, at Q 2 = 5 GeV 2 . In the 
MSTW08 analysis, s + s is smaller than (u + d)/2, especially 25 at large x. The strange density 
from the CTEQ6.6 fit is considerably larger, even in the range 10~ 2 < x < 10 _1 which is 
directly constrained by the data. The larger uncertainty obtained in the CTEQ6.6 analysis is 
probably due to the more flexible parameterisation. The uncertainty from the NNPDF2.1 fit, 
where any parameterisation bias is largely removed, was seen to be even larger (see Fig. 22). 
Note that the small uncertainty obtained in MRST2001 is due to the assumption s+s = k s (u+d) 
that was made in that fit. 

The strange asymmetry s — s is actually very loosely constrained, as shown in Fig. 28 (right). 
The existing data seem to indicate a positive value for the momentum asymmetry J dxx(s — 
s) of the strange sea. This asymmetry has important consequences for the sin 2 6w anomaly 
reported by the NuTeV collaboration [217]. From the asymmetry between a^u^N — > v^X) 
and cr(P M iV — y v^X), NuTeV extracted a value for sin 2 9 W that is 3cr above the global average. 
Half that discrepancy can be explained by isospin violations [215], and a positive value for 
J dxx(s — s) would further reduce this NuTeV anomaly. 

25 The strange quark mass could explain this additional suppression at high x, as this corresponds to low W 2 , 
i.e. close to the production threshold. 
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(s+s)/(u+d) distribution at Q = 5 GeV 2 
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Figure 28: Left: The ratio of the input s + s distribution to that of the non-strange sea u + d at 
Q 2 = 5 GeV 2 , as obtained in the MSTW08, MRST2001 and CTEQ6.6fits, with 2a uncertainty 
contours. Right: The strange asymmetry xs — xs at Qq = 1 GeV 2 , together with the la 
uncertainty. From [1 73]. 



3.5.6 Compatibility between the datasets 

Except for some datasets of electroweak boson production at the Tevatron discussed in sec- 
tion 3.5.1, the global QCD analyses find in general a very good consistency of all datasets with 
each other and with NLO QCD. 

Some amount of tension is however observed between the F 2 data of fixed target fip experiments 
and the rest of the data, although not consistently in all analyses. In the NNPDF and CTEQ 
analyses, the x 2 of the NMC F 2 data is a bit large. Since this was already the case in the 
early NNPDF analyses where a parameterisation of the structure function F 2 was constructed 
without using pQCD, this could reflect the fact that the data within this dataset show point- 
by-point fluctuations which are larger than what is allowed by their declared uncertainty [176]. 
Some tension is also seen in the MSTW08 analysis between the BCDMS F 2 data and the rest 
of the data, with the BCDMS jip data tending to prefer a higher gluon at high x in order to 
accommodate the observed Q 2 dependence. A similar observation was made in [218] within 
the framework of the CTEQ analysis. As the degree of compatibility between the BCDMS 
data and the rest of the data becomes better when a higher Q 2 min cut is applied, this may be an 
indication of non perturbative effects in these jip data at low Q 2 , and / or of deviations from 
NLO DGLAP in the HERA measurements at very low x. 

Some inconsistencies are also observed with the neutrino DIS data. Discrepancies between the 
NuTeV and the older CCFR structure function measurements at high x are now understood by 
both groups, and the NuTeV dataset is believed to be more reliable (see section 2.2.4). However, 
the CHORUS measurements (obtained with a lead rather than an iron target) also disagree with 
the NuTeV data at high x. As a result, the MSTW08 analysis includes the NuTeV and CHORUS 
data (which replace the CCFR measurements) only for x < 0.5. These NuTeV and CHORUS 
data were analysed together with the latest Drell-Yan measurements from E866 in [219], in a 
global fit similar to those performed by the CTEQ collaboration. This fit yields a d/u ratio 
which flattens out significantly at high x. A tension is observed at high x: the NuTeV data 
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pull the valence distributions upward (which pulls against the BCDMS and NMC data), while 
the E866 measurements prefer lower valence distributions at high x. This tension is actually 
amplified by the nuclear corrections applied to the NuTeV data. 



3.6 Fits and calculations at NNLO 

3.6.1 Status of NNLO fits 

Over the past years, an increasing number of QCD calculations have become available at NNLO 
with the goal of reducing the scale uncertainties on the resulting predictions compared to their 
NLO counterparts. Consequently, parton densities are now extracted at NNLO by several 
global analyses. NNLO PDFs have been published for the MSTW08, NNPDF2.1 (in [220]) 
and NNPDF2.3, ABKM09 [179] and ABM 11 [81] fits. Preliminary NNLO PDFs have also 
been extracted within the HERAPDF framework (HERAPDF1.5) and the CT group plans to 
release soon a NNLO set [221]. 

The NNLO analyses of ABKM09 and ABM 11 include DIS data and Drell-Yan measurements 
from fixed target experiments. The MSTW08 and NNPDF analyses include additional datasets 
as they use the same data as in their NLO fits. However approximations have to be made in 
order to include the Tevatron jet data, since the full NNLO corrections to jet cross sections are 
not available yet. Both groups use the approximate NNLO calculation obtained from threshold 
resummation [222] and implemented in the FastNLO package. ABM1 1 also used this approach 
in fits made to check the impact that the Tevatron jet data would have on their analysis [81], 
but their central fit sticks to the datasets for which the theoretical calculation is exact. This 
approximation is however believed to be robust as the threshold correction should be the only 
source of large NNLO corrections [173]. 

Figure 29 compares the gluon densities at Q 2 = 2 GeV 2 , as obtained by the NNPDF2.1, 
MSTW08 and ABKM09 analyses. At that low scale, the number of flavours is three in the 
GM-VFNS analyses of MSTW08 and NNPDF2.1, hence the n f = 3 set of the FFNS anal- 
ysis of ABKM09 is used for the comparison. Sizable differences can be observed in Fig. 29. 
The gluon distribution of ABKM09 is markedly different from that of MSTW08, which at low 
scales becomes negative at low x. Part of the differences seen at low x can be due to the fact 
that MSTW08 and ABKM09 use the individual HI and ZEUS data while NNPDF2.1 uses the 
more precise combined dataset. Indeed, the NNLO gluon density obtained from the ABM1 1 fit, 
which uses the combined HERA-I dataset, is lower than that of ABKM09 and in better agree- 
ment with that of NNPDF2. 1 . The lower ABKM09 gluon distribution at high x may come from 
the fact that the central NNLO fit of ABKM09 does not include the Tevatron jet data. For other 
densities, the agreement between the central values is in general better, although the uncertainty 
bands are different, being larger for NNPDF2.1. Examples of NNLO predictions and of their 
PDF uncertainties for benchmark processes at the LHC will be shown in chapter 4. 

3.6.2 The convergence of perturbative series and low x effects 

For processes that do not involve low x partons, calculating the cross sections to LO, NLO and 
NNLO shows a reasonable convergence of the perturbative series. For example, the NNLO 
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Figure 29: Comparison of the gluon density at Q 2 = 2 GeV 2 obtained in the NNPDF2.1, 
MSTW08 and ABKM09 NNLO analyses. The NNPDF2.1 density is shown for a s = 0.119 
(a s = 0.114) in the top (bottom) row, such that it can be compared with the overlaid MSTW08 
(ABKM09) density. Note that the ABKM uncertainties also include the uncertainty on a s while 
for NNPDF and MSTW they are pure PDF uncertainties. From [220]. 

cross sections for W and Z production at the LHC, obtained from the NNLO MSTW08 PDFs, 
is only 3-4% higher than the NLO cross section obtained from the NLO MSTW08 PDFs. 
However, for processes involving low x partons, convergence may not be reached at NNLO. 
This is illustrated in Fig. 30 which shows the Drell-Yan cross sections at LO, NLO and NNLO, 
in four mass bins. For di-lepton masses smaller than a few 10s of GeV, the NNLO and NLO pre- 
dictions are largely different, even in the central region; the difference being larger for smaller 
masses. This may indicate that, in part of the kinematic range where the LHC experiments 
will make measurements (for example, LHCb should measure Drell-Yan at low masses in the 
rapidity range 2 < y < 5, see chapter 4), a resummation of terms in ln(l/a;) may be needed. 

The measurement of the longitudinal structure function Fl at HERA [123, 124] provides another 
test-bench for low x effects that are not accounted for in the NNLO DGLAP equations. A 
resummed calculation was shown to best describe the data [38, 224]. However, as shown by 
Fig. 31, the fixed order DGLAP predictions are, in general, in reasonable agreement with the 
measurement, within the rather large uncertainties. 

The need for ln(l/x) resummations was also investigated by studying exclusive final states, 
such as forward pions or forward jets at HERA. The measurements were compared to fixed 
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Y*/Z rapidity distributions at LHC 




Figure 30: Drell-Yan cross section at the LHC in several mass bins, at LO, NLO and NNLO. 
From [223]. 

order DGLAP predictions, and to predictions based on the BFKL equation [225, 226], which 
involve un-integrated parton densities 26 . No conclusive evidence for effects of BFKL dynamics 
was observed. 

Besides resummations, the evolution of PDFs at very low x is expected to be affected by sat- 
uration effects, due to parton recombinations. Saturation would lead to a taming of the rise of 
F 2 at low x. Such an effect has been looked for in HERA data, by investigating the slopes of 
F 2 [123], but was not observed in the x and Q 2 range of the measurements. A possible hint 
for saturation in HERA data may come from the energy dependence of diffractive interactions, 
which was seen to be the same as that of the total cross section [227]. These aspects can be ad- 
dressed within dipole models (see [228, 229] and references therein); a deeper discussion goes 
beyond the scope of this review. 

26 The un-integrated gluon density needed for these predictions is calculated from the usual gluon density via 

j Q dk\jk\xj{x, k\) = xg(x, Q 2 ). 



67 




Figure 31: The longitudinal structure function F L measured by the HI experiment, and com- 
pared with various NLO and NNLO predictions. From [123] . 



4 PDF Constraints from the LHC 



The LHC pp collision physics programme is driven by the search for new physics and the under- 
standing of electroweak symmetry breaking. Precise theoretical predictions of background pro- 
cesses are needed for a discovery, whereas accurate predictions of new phenomena are needed 
for the interpretation of exotic physics signals or for verification of the Higgs boson properties. 
This programme is now well under way with about 5 fb _1 of luminosity delivered to the ATLAS 
and CMS experiments in 201 1, and more than 20 fb _1 at y/s = 8 TeV expected by the first long 
shutdown of the LHC in 2013. As discussed in section 2.4 measurements from the Tevatron pp 
collider provide important PDF constraints beyond those obtained from DIS data. Similarly it 
is expected that measurements from the LHC experiments will also improve our knowledge of 
proton structure. 



4.1 The LHC experiments 

The kinematic region opened up to the ATLAS, CMS and LHCb experiments 27 in the initial 
phase of LHC operation at a/s = 7 TeV is shown in Fig. 32. The lowest Q is set by available 
trigger thresholds and the lowest x is determined by detector rapidity (y) acceptance. The 
ATLAS and CMS experiments are largely limited to \y\ < 2.5. For W/Z production from 
partons with momentum fractions X\ and x 2 (here, X\ < x 2 by convention) M wz = sxix 2 and 
the boson rapidity is given by Eq. 32. This restricts the x range at y/s = 7 TeV to approximately 
10^ 3 < x < 10 _1 . In contrast the LHCb experiment with more forward instrumentation is able 
to access the region 2 < y < 5 which corresponds to 10 -4 < x\ < 10~ 3 and 0.1 < x 2 < 1 for 
the same Q = Mw,z- The overall reach in x will be extended by a further factor of two with 
y/s ~ 14 TeV operation expected by 2015. 

27 The LHC experiment ALICE whose main goal is the study of heavy ion physics will not be discussed in this 
article. 
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7 TeV LHC parton kinematics 
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Figure 32: Kinematic phase space accessible at the LHC with pp collisions at y/s=7 TeV. 
From [230], adapted from [231]. 

4.1.1 The ATLAS and CMS detectors 

The ATLAS [232] and CMS [233] detectors are designed as multi-purpose experiments to ex- 
ploit the full physics potential at the LHC. They are segmented into a central barrel part and two 
endcap regions. The innermost part of the detectors consist of precision silicon pixel and strip 
tracking detectors close to the nominal interaction points providing charged particle momentum 
reconstruction over the region \r)\ < 2.5. For ATLAS the silicon trackers are supplemented by 
a surrounding straw-tube transition radiation tracker for \r)\ < 2.0 to enhance electron identifi- 
cation. Both detectors have a large solenoid field axial with the LHC beamline. The 2 T field 
in the case of ATLAS encloses the tracking and the electromagnetic calorimeter, whereas the 
trackers, and both electromagnetic and hadronic calorimeters are immersed in the 3.8 T field of 
CMS. 

The ATLAS electromagnetic and hadronic calorimeters extend to \rj\ < 3.2 and use a combina- 
tion of liquid argon and tiled scintillator technologies as the active media for energy sampling. 
A very forward calorimeter provides additional coverage for 3.2 < \rj\ < 4.9. The CMS elec- 
tromagnetic and hadronic calorimeters use lead-tungstate crystals and scintillating plates for 
energy sampling respectively, in the region \rj\ < 3.0, and are supplemented by an additional 
sampling Cherenkov calorimeter in the forward region covering 3.0 < \r)\ < 5.0. 

Muons are measured in detectors located outside of the magnet solenoids. The CMS design 
places the detectors inside a steel return yoke for the solenoid which provides a bending field 
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for the muons and covers \rj\ < 2.4. The ATLAS muon spectrometer uses three large supercon- 
ducting toroid magnet systems and is able to measure muons over the range \rj\ < 2.7. 

4.1.2 The LHCb detector 

The LHCb detector [234] is primarily designed to study properties of 5-meson decays at the 
LHC, including CP violation and observation of rare B decays. It also has a programme of 
QCD and EW physics measurements which are of relevance to this article. The detector is 
a single arm forward spectrometer covering the region 2 < 7/ < 5. A precision silicon strip 
vertex detector is located close to the interaction region. Further strip silicon tracking devices 
are located on either side of a dipole magnet, supplemented by straw-tube tracking chambers. 
A ring imaging Cherenkov detector is used to help identify charged hadrons. Electromagnetic 
and hadronic calorimeters located downstream of the magnet distinguish electrons, photons and 
hadrons. Muons are detected in multi-wire proportional chambers furthest from the interaction 
region. 

4.2 Benchmark cross section predictions 

Parton luminosities are a convenient means of estimating the PDF contributions to, and the 
^Js dependence of hadronic cross sections for given combinations of partons [235]. The parton 
luminosity for the combination J2 q <?+<? is relevant, for example, for Z° production, whereas the 
combination gg is of importance for Higgs production at the LHC. Using Eq. 31, r = X\ ■ x% = 
s/s where s is the partonic centre-of-mass energy, the differential luminosities || are defined 
as: 



The ratio of several NLO parton luminosities to MSTW08 are compared in Fig. 33 where very 
good agreement between all sets is attained for the W, Z resonance region, but diverge rapidly at 
higher or lower fractional partonic centre-of-mass energy. The level of agreement is similar for 
the gg combination which shows a large spread of predictions which are in some cases outside 
the uncertainty bands of some of the predictions. This has led to a debate on the best way to 
estimate PDF uncertainties for cross section predictions incorporating the spread between PDF 
sets, and is discussed below. 

A series of benchmark cross sections have been calculated at NLO and NNLO [236] in order 
to review the consistency of the most current PDF sets available (MSTW08, CTEQ6.6, CT10, 
NNPDF2.1, HERAPDF1.0, ABKM09 and GJR08). The chosen processes are W, Z and tt 
production cross sections, as well as Higgs production with masses of Mh = 120, 180, 240 
GeV. The cross sections are determined for fixed values of a s . 

An example of the NLO benchmark predictions is shown in Fig. 34. Each point is plotted at the 
value of a s used in the central fit of the analysis. The dashed curves show the a s variation using 




(56) 
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Figure 33: Parton luminosities for the LHC at 7 TeV for the combination J2 q 1 + 1 (left) an d gg 
(right). From [236]. 
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Figure 34: Predicted NLO cross sections for the LHC at y/s = 7 TeV for W + + W~ production 
(upper left), Z° production (upper right), gg —> H for M H = 120 GeV (lower left) and ti 
production (lower right) with 68% confidence level uncertainties as a function of a s . The inner 
vertical error bar corresponds to the PDF uncertainty and the outer error bar includes the as 
uncertainty. The horizontal error bar shows the as(M^) range considered for the uncertainty. 
From [236]. 
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alternative PDFs from each group. The precision of the absolute cross section predictions at 
68% CL is broadly similar for each PDF set, i.e. ~ 2 - 3% for W + + W~ production and ~ 2% 
for Z production. It is however apparent that these uncertainties do not fully cover the spread 
of predictions ~ 6%. As expected the ratio of the W + + W~ and Z cross sections (not shown) 
are obtained with greater precision and show a smaller spread. This is due to the fact that the 
numerator and denominator of the ratio are both largely sensitive to the PDF combination u + d, 
and that the a s dependence almost cancels. 

In addition to these variations the theoretical uncertainty should also take into consideration 
the effects of neglected higher-orders. These are usually estimated by varying the /xp and lir 
scales within a factor of two of the default choice usually taken to be lif = I^r = Mz,w,h- The 
influence of the scale uncertainty depends on the cross section under study, and can be 3% for 
Z production at NLO but is dramatically reduced to 0.6% at NNLO [236]. 

Production of ti pairs at the LHC for y/s = 7 TeV is dominated by gg initial states which 
account for 80% of the production cross section and at threshold probes x ~ 2m t / y/s = 5 x 
10^ 2 [236]. By contrast W and Z resonant production is dominated by qq pairs probing x ~ 
2 x 10 -2 . Thus W, Z cross sections are anti-correlated with ti production since an enhanced 
gluon distribution at higher x would lead to a reduced quark distribution at lower x through 
the sum rules [231]. The predictions for ti production are as yet only approximately known 
at NNLO. The predictions with different PDF sets [236] are calculated at NLO and NNLO 
and show considerable range of about ±10% which is larger than the uncertainties estimated 
from a single PDF set at 68% CL. The 90% CL uncertainty bands give a better reflection of the 
variation in the predictions. 

In Fig. 35 a comparison of production cross sections for the SM Higgs boson is shown (as a ratio 
to the MSTW08 prediction) for a range of M H . The NLO predictions each have an uncertainty 
of ±3% (at 68% CL and including the uncertainty on a s ) although the spread between different 
PDF sets can be as large as 10%. At NNLO the uncertainty bands are marginally larger, and 
the spread of predictions is considerably larger than at NLO. However, scale uncertainty is not 
included in the error bands shown, and is reduced by a factor of two to about 9% at NNLO [237, 
238]. 

These studies have been discussed in detail within the PDF4LHC working group [239]. The 
group has made a recommendation on how to determine NLO and NNLO PDF uncertainties for 
cross section predictions which takes into account the spread between the PDF groups [240]. 
At NLO the prescription is based on the MSTW08, CTEQ6.6 and NNPDF2.0 PDF sets which 
are commonly used by the LHC experiments (although now CTEQ6.6 is superceded by CT10, 
and NNPDF2.0 by NNPDF2.3). The recommendation is to calculate the envelope of the three 
group's PDF +a s uncertainty, and the mid-point taken as the central value. At NNLO the rec- 
ommendation is based on the MSTW08 PDFs where the uncertainty of this set is increased by 
a scale factor obtained from the ratio of the NLO PDF4LHC uncertainty band to the MSTW08 
NLO error band. This factor is found to be ~ 2 for the gg — > H process at the LHC. 

It is argued [ 36] that the prescription given above may be overly complex to apply to all pro- 
cesses, for example in a process where the theoretical uncertainty is dominated by scale varia- 
tions. In some cases it may be easier and statistically more correct to evaluate the uncertainties 
according to the prescription of one PDF group using the 90% CL uncertainty. For example 
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Figure 35: Predicted cross sections at NLO (left) and NNLO (right) for Higgs production with 
68% confidence level uncertainties as a function of M H . The ratio to MSTW08 prediction is 
shown. From [236]. 



the NNLO uncertainties evaluated using the MSTW08 PDFs (and their prescription) including 
scale uncertainties, a s variations and the choice of b and c quark masses are found to be 
and ^4 7% for Z production production at 68% and 90% CL respectively. 



4.3 First LHC measurements 
4.3.1 Electroweak measurements 

The initial measurements of the W and Z production total and differential cross sections have 
now been published by ATLAS [241], CMS [242-244] and LHCb [245]. The Z production 
cross section is sensitive to the dominant combinations uu + dd + ss, whereas W+ probes 
ud + cs and W~ probes du + sc. Thus the flavour structure of the proton is accessible via 
measurements of W+ and W~ production, or through the W lepton charge asymmetry A(rj): 

= da/d V (W+ -» l+v) - da/d V (W- -+ l-v) 
{T]> da/dr](W + -> l+v) + da/dr](W- -> l~v) K 

which have recently been published [241,245-248]. The most precise measurements of the 
asymmetry in pp collisions from DO show some tension with the CDF measurements and 
to some extent with other DIS data (see 3.5.1). At the LHC the spread of predictions for 
this observable can be as much as a factor of two larger than the 90% CL uncertainty from 
MSTW08 [236]. Fits to the di-muon production data in v and v induced DIS prefer an en- 
hanced s compared to s contribution (see section 3.5.5), although the significance of this find- 
ing is weak. Since the contribution of s/s to Z and W production is large at the LHC (up to 
20% and 27% respectively at NLO [249]) new LHC data could help resolve the issue and set 
interesting constraints in the strange sector. First studies were carried out in [250] and pursued 
in [213]. 
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W and Z cross sections First measurements of the W and Z production cross sections in e 
and fi decay channels at y/s = 7 TeV are available [241,242,245] using ~ 35 pb -1 integrated 
luminosity recorded in 2010. The measurements are systematically limited and both experi- 
ments have a precision of ~ 1% (excluding a 3 — 4% luminosity uncertainty). Fig. 36 shows 
the correlation of the W + to W~ production cross sections, and the W + + W~ to Z produc- 
tion from ATLAS. NNLO predictions compare favourably with the measurements within their 
quoted uncertainties. 
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Figure 36: The measured and predicted production cross sections times branching ratio for W + 
vs. W~ production (left), and for W vs. Z production (right). From [241]. 



Differential Drell-Yan measurements The virtual 7* cross section below the Z resonance 
provides complementary information to that obtained at the Z peak. At low 7* invariant mass 
the electromagnetic couplings dominate which suppress the rf-type contributions whereas the 
axial and vector EW couplings to the u and d quarks, v 2 + a 2 , are of similar size (see Eq. 42). 
Thus measurements at the Z resonance peak and of the low mass continuum are sensitive to 
different combinations of w-type and d-type quarks. 

Since the virtual boson rapidity is related to the ratio of the quark and anti-quark parton mo- 
mentum fractions, an interesting measurement is the y spectrum in Z/Y interactions. At large 
y the longitudinally boosted boson arises from increasingly asymmetric momentum fractions 
of the q, q pair which provides simultaneous access to the high x and low x kinematic regions. 
Measurements of the low mass Drell-Yan cross section reach the region of very low x ~ 1CT 4 
for ATLAS and CMS, and 1CT 5 for LHCb. At very low mass however, fixed order calculations 
are not yet stable (see 3.6.2). The PDF uncertainty for M ~ 15 GeV is estimated to be 3% at 
NLO but the scale uncertainty can lead to variations of as much as 30% on the cross section 
(taking n F = fi R = s) which rapidly diminishes with increasing M, and could limit the use of 
the lowest M data in PDF fits. At NNLO the scale uncertainty remains sizeable at about 4% but 
can be reduced by choosing the scale appropriately such that the higher order contributions are 
minimised [251]. 
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The first measurements of the differential invariant mass spectrum from CMS [244] are shown 
in Fig. 37 (left) spanning the range 15 < M < 600 GeV compared to NNLO predictions. 
Preliminary measurements from LHCb down to M = 6 GeV are also released [252]. Both 
measurements are based on the 2010 datasets and have a moderate precision of 9% at M = 
15 — 20 GeV which is expected to improve. 

Differential spectra for W and Z production have been published by ATLAS [2 U], CMS [243] 
and LHCb [245] and the LHCb measurements are shown in Fig. 37 (right) compared to NNLO 
predictions. The data, which in this case are based on the statistically limited 2010 data sample, 
are not yet of sufficient precision to significantly constrain the PDFs although some deviation 
between theory and measurement is observed for 2.5 < y < 3.0. It will be interesting to 
see how this develops with the new measurements with higher statistical precision and smaller 
systematic uncertainties. 
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Figure 37: Left: The normalised differential Drell-Yan cross section vs. invariant mass of the 
virtual boson. Right: the differential rapidity spectrum of Z° production. From [244, 245]. 



W charged lepton asymmetry Measurements from ATLAS, CMS and LHCb of the W 
charged lepton (e + /i) asymmetry are presented in [241,245-248]. Fig. 38 shows the asymme- 
try as determined by all three experiments compared to fixed order NLO and NNLO predictions 
from several PDF groups. The CMS electron channel measurement is shown in Fig. 38 (left) 
using 840 pb _1 of integrated luminosity with a lepton px cut of 35 GeV. The data are in good 
agreement with most NLO predictions, but at low lepton rapidity r/i the MSTW08 prediction 
undershoots the data. The differences between PDF sets are more pronounced in the predicted 
W + and W~ rapidity spectra leading the authors of [241] to argue that the individual spectra 
are more sensitive than the lepton charge asymmetry. Better agreement between predictions and 
measurements may be obtained when comparing to resummed calculations at next-to-next-to- 
leading log order since these calculations give a better description of the the W pr spectrum as 
pointed out in [247]. 

The charged lepton asymmetry measurements from ATLAS, CMS and LHCb are shown to- 
gether in Fig. 38 (right) based on a smaller data set of 35 — 36 pb _1 and less restrictive phase 
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space with the lepton p T required to be above 20 GeV. The NLO predictions of CTEQ6.6, 
HERAPDF1.0 and MSTW08 are in reasonable agreement with the data shown, although here, 
better agreement with MSTW08 is observed albeit within larger experimental uncertainties. Of 
particular interest is the region accessed by the LHCb measurement for r/i > 2.5 where the 
predictions are in agreement with each other and the data but with relatively large uncertainties. 
Thus current and future measurements are expected to have a visible impact in reducing the 
PDF uncertainties and improving the consistency between PDF sets at large and small rji. 
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Figure 38: Charged lepton asymmetry in W decays at the LHC. Left: CMS electron measure- 
ment for p T > 35 GeV with 840 pb' 1 from [248]. Right: combined electron and muon channel 
measurements from ATLAS, CMS and LHCb forp T > 20 GeV with 35 - 36 pb' 1 from [253]. 



4.3.2 Inclusive jet cross sections 



Differential inclusive jet cross sections d 2 u / dydpT at y/s = 7 TeV are available from AT- 
LAS [254] and CMS [255] and an example of the data can be seen in Fig. 39. Even with the 
modest luminosity of ~ 35 pb _1 the measurements extend to jet transverse momenta of about 
1.5 TeV. The wide r] range of the ATLAS and CMS calorimeters compared to the Tevatron 
experiments allows the jet cross sections to be measured up to high rapidities of 4.4. The mea- 
surements are sensitive to partonic momentum fractions x of ~ 10~ 5 < x < 0.9, however the 
precision is limited by the knowledge of the detector calibration. Jet cross sections exhibit a 
very sharply falling jet p T spectrum (see for example Fig. 13), therefore small changes in the 
jet energy scale lead to large correlated shifts in the cross sections. Currently this leads to mea- 
surement uncertainties of about 10 — 60% dominated by a scale uncertainty of 3 — 4% in the 
central detector regions for moderate jet pr and rising to ~ 12% at the highest y. 



4.3.3 The NNPDF2.3 PDFs 



Global fits that include the early LHC measurements described above have been performed by 
the NNPDF collaboration. 
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Figure 39: Ratio of the measured inclusive jet cross section d 2 a/dydpr to the theoretical pre- 
diction using CTWPDFs. The predictions from MSTW08, NNPDF2.1 and HERAPDF 1.5 are 
also shown. From [254]. 

In [256], the NNPDF2.1 NLO PDFs were updated using a reweighting technique to include 
the first W charged lepton asymmetry measurements from ATLAS and CMS, leading to the 
NNPDF2.2 set. 
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Figure 40: Comparison of the strangeness and singlet distributions obtained from NNPDF2.3, 
which includes the LHC data, and from the same tit but restricted to the non-LHC measure- 
ments. The contours correspond to uncertainties at 68% confidence level. From [213]. 

In [213] new fits were performed which include, in addition to the non-LHC data used in 
NNPDF2.1, the published measurements from the LHC experiments for which the covariance 
matrix of the correlated systematic uncertainties has been provided: the W and Z lepton ra- 
pidity distributions measured by ATLAS [241] and LHCb [245] using the 2010 data, the W 
electron asymmetry measured by CMS in the 201 1 dataset [248], and the ATLAS inclusive jet 
cross sections measured in the 2010 data [254]. The corresponding NNPDF2.3 PDFs have been 
determined both at NLO and at NNLO for a wide range of values of ct s , in the same GM-VFNS 
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scheme as used for NNPDF2.1 , and are available in the LHAPDF interface. For the determi- 
nation of NNLO PDFs, the NNLO predictions for electroweak boson production at the LHC 
have been obtained via K-factors. For inclusive jet production at the LHC, the NLO matrix ele- 
ments have been used (together with NNLO PDFs and a s ) instead of the approximation usually 
made to calculate NNLO jet production at the Tevatron (see section 3.6), because the threshold 
approximation is expected to be worse at the LHC energies. 

The resulting NNPDF2.3 distributions provide a good description of all datasets included in the 
fit. The comparison of these PDFs with those obtained from the same fit performed to the non- 
LHC data only (the NNPDF2.3-noLHC set already mentioned at the beginning of 3.5) allows 
to gauge the specific impact of the LHC data. This impact is so far moderate but already visi- 
ble [213]: the uncertainty on the gluon distribution at high x is somewhat reduced thanks to the 
LHC jet data; the electroweak boson production data help improve the flavour decomposition; 
and the strangeness fraction of the light sea is pushed towards slightly higher values, although 
with a marginal statistical significance. As an example, Fig. 40 compares the strange and singlet 
distributions obtained from NNPDF2.3 and NNPDF2.3-noLHC. 

PDF fits based only on collider data PDFs derived from a fit restricted to data from collider 
experiments were also extracted in [213]. The motivation of this approach lies in the fact that the 
resulting PDFs are, by construction, independent of any nuclear or higher twist corrections that 
may affect some fixed target measurements, and could explain some of the tensions reported 
in 3.5.6. Restricting the fit to the collider measurements reduces by a factor of ~ 3 the num- 
ber of fitted data points. The resulting PDFs show no significant differences with those from 
NNPDF2.3, which indicates that any tension between collider and fixed target data can only be 
moderate. However, some distributions resulting from this fit show very large uncertainties. For 
example, the anti-quark PDFs at high x are very poorly constrained in such a fit, as shown 29 by 
the dashed curves in Fig. 26 of section 3.5.3. 

4.3.4 Top production 

The production of ti pairs is dominated by gg fusion at the LHC, and at y/s = 14 TeV this 
subprocess contributes 90% of the total cross section. Therefore this provides an interesting 
probe of the high x gluon particularly at large ti invariant mass > 1 TeV. However, care should 
be taken in interpreting these cross sections which are also used to constrain many models of 
new physics coupling to the top quark. 

Latest measurements of the total production cross section based on 7 and 8 TeV centre-of-mass 
energy data from ATLAS and CMS have been reported in a variety of decay modes includ- 
ing single and di-lepton W decays as well as purely hadronic modes, and measurements using 
b tagged jets, for example [257-261] and references therein. The latest combination of mea- 
surements are presented in [262, 263] and an experimental precision of ~ 5% is now achieved 
(excluding the luminosity uncertainty). Recent approximate NNLO predictions [264] and NLO 

28 PDFs obtained in the FFNS with nj = 4orn/ = 5 active flavours are also provided. 

29 The LHC data, not included in the collider fit illustrated in Fig. 26, do not reduce these uncertainties yet. 
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predictions with next-to-next-to-leading log (NNLL) corrections [265] both have a similar ac- 
curacy of about ±4% for the scale variation uncertainty and ±5% for the PDF uncertainty 
evaluated using only the MSTW08 NNLO set at 90% CL. This estimated PDF uncertainty is 
smaller than the spread of different predictions as discussed in 4.2, and the measurements are 
expected to constrain the differences between the PDF sets. 

A first measurement of the normalised differential it cross section is now available performed 
with a 2 fb" 1 data sample at y/s = 7 TeV [266] in the single lepton (e+fi) channel. The data are 
shown in Fig. 41 and compared to NLO and NLO+NNLL predictions. A precision of 10 — 20% 
is achieved which is limited by uncertainties related to the jet energy scale and resolution. 
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Figure 41: Normalised tt differential cross section at yfs = 7 TeV compared to four NLO and 
NLO+NNLL predictions. From [266]. 



4.3.5 Prompt photon production 

The potential sensitivity of measurements of isolated photon hadro-production on the gluon 
density has been mentioned in section 2.4.4, in the context of pre-LHC experiments. In pp 
collisions at the LHC, the relative contribution of the QCD Compton process qg — > ^q to prompt 
photon production is enhanced compared to what happens in pp collisions at the Tevatron, where 
qq annihilations qq — > 'jg also play an important role. Moreover, in the large kinematic domain 
where the measurement can be performed at the LHC, the gluon density is involved in a broad 
range of Bjorken-a;, from O(10~ 3 ) at rapidities of \rj\ ~ 2 and low transverse energy to 0(0.1) 
at central pseudo-rapidities and high E T [170]. Hence, the impact of LHC prompt photon 
measurements on the gluon PDF is expected to be significant. 

First measurements of isolated prompt photon production have been published by the AT- 
LAS [267] and CMS [268] experiments using pp data 30 taken at y^s = 7 TeV corresponding 
to an integrated luminosity of ~ 35 pb" 1 . For example, the CMS measurement, made in four 

30 Measurements have also been made at y/s = 2.76 TeV, in pp and in Pb-Pb collisions [269]. 
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pseudo-rapidity regions and in the transverse energy range 25 < E Ta < 400 GeV, is shown 
in Fig. 42. It is consistent with the NLO prediction from pQCD obtained from the JETPHOX 
program [270,271] using the CT10 PDFs. 
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Figure 42: Left: The isolated prompt photon cross section measured in four pseudo-rapidity 
bins as a function of the photon transverse energy, together with the NLO QCD prediction. 
Right: Ratio of the measurement to the NLO prediction for the most central bin; the vertical 
error bars show the statistical uncertainties, while the shaded areas show the total errors (not 
including a 4% normalisation uncertainty). From [268]. 

In [170], the impact of these ATLAS and CMS data on the PDFs has been quantified using the 
NNPDF reweighting technique mentioned previously [256]. Including these data in a fit similar 
to NNPDF2.1 leads to a significant reduction of the uncertainty on the gluon PDF, of up to 20%, 
most pronounced for x ~ 0.01. Moreover, the fit does not change significantly the central value 
of the gluon density. This indicates that the constraints that these data set on the gluon PDF at 
high x have no tension with the constraints obtained from the Tevatron jet data. 

4.3.6 Cross section ratios 

The centre-of-mass energy of the LHC is being increased in a step-wise way with runs taken 
at y/s = 7 TeV and 8 TeV and after the long shutdown in 2013-2014, the machine is expected 
to operate at ~ 14 TeV. This gives rise to the possibility of measuring cross section ratios at 
different ^/s as well as double ratios of hard process cross sections. The advantage of these 
ratios is that experimentally many systematic uncertainties could cancel in the measurements. 
Cancellation in the theoretical uncertainties on the predictions are also expected [272] leading 
to very precise predictions and measurements. These could offer interesting new constraints on 
PDFs and enhanced sensitivity to new physics. Taking the ratio between the 7 and 8 TeV data 
the W and Z production cross section ratios are predicted to an accuracy of ~ 0.2% including 
PDF, a s , and scale uncertainties at NNLO. However, for high mass tt production the predicted 
uncertainty on the ratio is estimated to be 1%, and jet production ratios for jets with pr > 1 TeV 
are claimed to be known to ~ 2%, rising to ~ 6% for jet p T > 2 TeV. Both of these processes 
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probe the very high x PDFs and could therefore be used to constrain this region. The ratios 
between 8 and 14 TeV data offer even larger potential gains. It remains to be seen how well the 
experimental uncertainties on the measured ratios will cancel but this is an interesting proposal 
warranting further more detailed investigations. 



5 Conclusions and Outlook 

A lot of progress has been made over the past ~ 20 years in the understanding of proton struc- 
ture and in the determination of the parton distribution functions. On the experimental side, 
the HERA collider has opened up the kinematic domain of low x; high p T jet production at 
the Tevatron has shed light on the gluon density at high x; the measurements from fixed target 
experiments have been finalised. On the theory side, the extraction of PDFs has become more 
and more involved. While the first QCD fits were made at leading order only, to a small number 
of F 2 data points, using simple parameterisations with a few parameters, current QCD fits are 
now available up to NNLO; they make use of about 3000 data points, covering all processes that 
are sensitive to proton PDFs; the parameterisations have typically 25 — 30 parameters (ten times 
more for the NNPDF fits); and uncertainties are now delivered together with the central fits. 
The crucial need for obtaining error bands for the PDFs has also lead the experimental collabo- 
rations to publish their full correlated systematic uncertainties. This much improved knowledge 
of proton structure comes together with lots of progress in QCD phenomenology and theory: 
thanks to new calculation techniques, higher order calculations are now available for a wealth of 
processes; several resummed calculations also exist; the development of new subtraction tech- 
niques has allowed NLO calculations to be combined with the parton shower approach used in 
Monte-Carlos; new jet algorithms have been defined, that allow to better compare experimental 
measurements with theoretical calculations. As a result, the theoretical predictions for the pro- 
cesses that are, or will be, observed at the LHC, are much more robust than what they were one 
decade ago, at the start-up of the Run II of the Tevatron. 

Although most proton PDFs are now determined to a good precision, some open issues remain. 
For example, the strange content of the proton is still very poorly known, all PDFs are affected 
by large uncertainties at high x, and what happens at very low x remains largely unknown. 
The data that are being collected by the LHC experiments will further improve our knowledge 
of proton structure - although the highest mass domain, which may be affected by physics 
processes not accounted for in the Standard Model, may not be best suited to constrain the 
PDFs at highest x. 

In addition, other aspects of proton structure, which were not addressed in this review, are far 
from being understood. The proton spin is one of those. Since the surprising finding by the 
EMC Collaboration that very little of the proton spin is carried by the spins of quarks and anti- 
quarks, this issue has been tackled by several experiments, as (to mention only the most recent 
ones) the COMPASS experiment at CERN, HERMES at DESY, CLAS at JLab, and the STAR 
and PHENIX experiments at RHIC. One of the focus was the measurement of the contribution 
to the proton spin that is carried by gluons. There is currently no experimental evidence that 
this contribution may be important, however the uncertainties are large. Another issue regards 
the transverse structure: the standard PDFs probe the longitudinal momentum of the partons 



81 



in a fast moving hadron, all information about the transverse structure is integrated over. This 
additional information is encoded within the Generalised Parton Distributions (GPDs), which 
unify the concept of PDFs and that of hadronic form factors [273]. The GPDs, which can be 
accessed via exclusive processes as Deep Virtual Compton Scattering Ip — > Ip'j, are poorly 
constrained so far. 

The study of proton structure has a continuing programme over the next decade with, besides 
the LHC experiments, several new experiments and facilities in the planning, construction or 
starting phase. Some of these are designed to focus on the high x region at low and moderate 
Q 2 whereas others are designed to open a wider kinematic region than is currently accessible. 
They are briefly described below. 

The Minerva experiment (E938) at Fermilab [274] The Minerva detector is operated on 
the NuMI neutrino beamline. Its main goal is to perform precision measurements of neutrino 
scattering off several targets in the low energy regime, E u ~ 1 — 20 GeV, which are needed by 
experiments studying neutrino oscillations. Following a low-energy run which ended in May 
2012, data taken starting from 2013 with a higher energy beam will allow CC DIS to be further 
studied. The measurements should shed further light on the d/u ratio at high x. 

The Drell-Yan experiment E906/Seaquest [275] This Fermilab experiment continues the se- 
ries of fixed target pp and pd Drell-Yan measurements from E605, E772 and E866. Seaquest will 
operate with a 120 GeV proton beam delivering an instantaneous luminosity of 10 35 cm~ 2 s~ 1 , 
some 50 times the luminosity of E866. This will allow measurements of the d/u ratio to be made 
with a factor 10 improvement in precision in the region of 0.25 < x < 0.45. The experiment 
will commence physics runs in 2013. 

The COMPASS experiment at CERN has a programme extending until 2016 (see [276]). 
In particular it will perform further measurements of DVCS in 2015 — 2016 (this requires major 
rearrangements of the spectrometer and the installation of a recoil detector). Measuring the 
dependence in t, the momentum transferred at the proton vertex, will give access to the nucleon 
transverse size. Combined with the HERA data and the future JLab data (see below), a compre- 
hensive picture of the evolution of the nucleon's transverse size with XBjorken will be achieved. 
Information on GPDs will also be obtained. 

The upgrade of the accelerator complex at JLab [277] The Jefferson laboratory hosts the 
CEBAF dual linac electron accelerator currently operating at 6 GeV delivering beam to three 
experimental halls. The machine is being upgraded to operate at 12 GeV and instantaneous 
luminosities of 10 35 — 10 39 cm~ 2 s~ 1 which is necessary to explore the region of high a; ~ 0.7 and 
Q 2 < 8 GeV 2 . Each hall houses one or more experiments and a fourth hall is under construction. 
The experiments cover a variety of measurements of nuclear and proton DIS including precision 
measurements of Fl and F 2 at high x, measurements of F 2 neutron which will constrain the d/u 
quark ratio at high x, DVCS measurements, as well as polarised scattering. The staged upgrade 
programme is underway and expected to commence physics operation in 2015. 
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The Electron Ion Collider [278] At the horizon of ~ 2020, a future EIC could be realised 
as an upgrade to the existing RHIC ion collider (eRHIC). A 5 — 30 GeV polarised electron 
beam would collide with polarised ion beams reaching a maximum of 325 GeV for protons, 
and 130 GeV/A for heavier nuclei. Another option (MEIC / ELIC) would be to use the po- 
larised electron beam of JLab and add a new ring for polarised protons or ions. The EIC would 
be the first lepton-proton collider with a polarised proton beam. It would shed further light on 
the proton's spin problem. The contribution of gluons to the proton's spin will be measured 
precisely by an EIC. If this contribution turns out to be small, as indicated by the current ex- 
perimental data, it would mean that a large part of the proton spin is due to orbital angular 
momentum. The measurement of DVCS and of other exclusive processes as J/tp production, 
off transversely polarised protons, will bring information on GPDs at low x. Together with the 
information on GPDs obtained, at higher x, by other experiments, it may then be possible to 
have a direct access to the parton angular momentum via Ji's angular momentum sum rule [279] 
(this requires the GPDs to be reconstructed in a large kinematic domain). Moreover, the EIC 
physics programme also includes measurements of unpolarised proton and deuteron scattering 
at low x, measurements of Fl, and semi-inclusive DIS measurements sensitive to s/s content 
of the proton. 

The LHeC project [280] This is a novel proposal to build a ring or linac electron machine 
to collide with an LHC proton/ion beam using interaction point IP2 in the LHC tunnel. The 
LHC and LHeC could run simultaneously with operation commencing in 2023 or later, after 
the long shutdown in preparation for LHC high luminosity running. The electron ring operating 
at 60 GeV and yfs = 1.3 TeV could offer a luminosity of 10 33 cm~ 2 s~ 1 , a factor 20 higher 
than HERA. A linac option could achieve higher ^fs but the luminosity at these higher energies 
would be smaller. The physics programme would cover the measurement of precision NC and 
CC structure functions with a 20-fold increase in kinematic reach for Q 2 and 1/x compared to 
HERA, improved accuracy in the determination of a s , and the understanding of saturation and 
of non-linear dynamics. 

The long term future of DIS experiments is not yet clear with the last two projects described 
above still being discussed within the appropriate committees. Nevertheless, our knowledge of 
proton structure and QCD is expected to improve significantly within the next decade, alongside 
more general developments across the field of particle physics. Such developments will come 
in particular from the LHC experiments which, at the time of writing, have just announced the 
discovery of a new particle in their searches for the Standard Model Higgs boson. 
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